lora

Included with Lifetime

$97 forever

Parameter-efficient fine-tuning with Low-Rank Adaptation (LoRA). Use when fine-tuning large language models with limited GPU memory, creating task-specific adapters, or when you need to train multiple specialized models from a single base.

Productivity

What this skill does


# Using LoRA for Fine-tuning

LoRA (Low-Rank Adaptation) enables efficient fine-tuning by freezing pretrained weights and injecting small trainable matrices into transformer layers. This reduces trainable parameters to ~0.1% of the original model while maintaining performance.

## Table of Contents

- [Core Concepts](#core-concepts)
- [Basic Setup](#basic-setup)
- [Configuration Parameters](#configuration-parameters)
- [QLoRA (Quantized LoRA)](#qlora-quantized-lora)
- [Training Patterns](#training-patterns)
- [Saving and Loading](#saving-and-loading)
- [Merging Adapters](#merging-adapters)
- [Best Practices](#best-practices)
- [References](#references)

## Core Concepts

### How LoRA Works

Instead of updating all weights during fine-tuning, LoRA decomposes weight updates into low-rank matrices:

```
W' = W + BA
```

Where:
- `W` is the frozen pretrained weight matrix (d × k)
- `B` is a trainable matrix (d × r)
- `A` is a trainable matrix (r × k)
- `r` is the rank, much smaller than d and k

The key insight: weight updates during fine-tuning have low intrinsic rank, so we can represent them efficiently with smaller matrices.

### Why Use LoRA

| Aspect | Full Fine-tuning | LoRA |
|--------|------------------|------|
| Trainable params | 100% | ~0.1-1% |
| Memory usage | High | Low |
| Adapter size | Full model | ~3-100 MB |
| Training speed | Slower | Faster |
| Multiple tasks | Separate models | Swap adapters |

## Basic Setup

### Installation

```bash
pip install peft transformers accelerate
```

### Minimal Example

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
import torch

# Load base model
model_name = "meta-llama/Llama-3.2-1B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 3,407,872 || all params: 1,238,300,672 || trainable%: 0.28%
```

## Configuration Parameters

### LoraConfig Options

```python
from peft import LoraConfig, TaskType

config = LoraConfig(
    # Core parameters
    r=16,                          # Rank of update matrices
    lora_alpha=32,                 # Scaling factor (alpha/r applied to updates)
    target_modules=["q_proj", "v_proj"],  # Layers to adapt

    # Regularization
    lora_dropout=0.05,             # Dropout on LoRA layers
    bias="none",                   # "none", "all", or "lora_only"

    # Task configuration
    task_type=TaskType.CAUSAL_LM,  # CAUSAL_LM, SEQ_CLS, SEQ_2_SEQ_LM, TOKEN_CLS

    # Advanced
    modules_to_save=None,          # Additional modules to train (e.g., ["lm_head"])
    layers_to_transform=None,      # Specific layer indices to adapt
    rank_pattern=None,             # Per-module rank overrides
    alpha_pattern=None,            # Per-module alpha overrides
    trainable_token_indices=None,  # Train only selected new token embeddings
    target_parameters=None,        # Target nn.Parameter weights in some MoE layers
    use_rslora=False,              # Rank-stabilized LoRA scaling
    use_dora=False,                # Weight-Decomposed LoRA
)
```

### Target Modules by Architecture

```python
# Llama, Mistral, Qwen
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

# GPT-2, GPT-J
target_modules = ["c_attn", "c_proj", "c_fc"]

# BERT, RoBERTa
target_modules = ["query", "key", "value", "dense"]

# Falcon
target_modules = ["query_key_value", "dense", "dense_h_to_4h", "dense_4h_to_h"]

# Phi
target_modules = ["q_proj", "k_proj", "v_proj", "dense", "fc1", "fc2"]

# MoE expert parameters stored as nn.Parameter tensors
target_parameters = ["feed_forward.experts.gate_up_proj", "feed_forward.experts.down_proj"]
```

For MoE models where expert weights are not `nn.Linear` modules, use `target_parameters` rather than `target_modules`. Merge adapters before latency-sensitive inference because materializing expert LoRA updates can add overhead.

### Finding Target Modules

```python
# Print all linear layer names
from peft.utils import get_peft_model_state_dict

def find_target_modules(model):
    linear_modules = set()
    for name, module in model.named_modules():
        if isinstance(module, torch.nn.Linear):
            # Get the last part of the name (e.g., "q_proj" from "model.layers.0.self_attn.q_proj")
            layer_name = name.split(".")[-1]
            linear_modules.add(layer_name)
    return list(linear_modules)

print(find_target_modules(model))
```

## QLoRA (Quantized LoRA)

QLoRA combines 4-bit quantization with LoRA, enabling fine-tuning of large models on consumer GPUs.

### Setup

```python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import torch

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",           # Normalized float 4-bit
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,       # Nested quantization
)

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    quantization_config=bnb_config,
    device_map="auto",
)

# Prepare for k-bit training
model = prepare_model_for_kbit_training(model)

# Apply LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

model = get_peft_model(model, lora_config)
```

### Memory Requirements

| Model Size | Full FT (16-bit) | LoRA (16-bit) | QLoRA (4-bit) |
|------------|------------------|---------------|---------------|
| 7B         | ~60 GB           | ~16 GB        | ~6 GB         |
| 13B        | ~104 GB          | ~28 GB        | ~10 GB        |
| 70B        | ~560 GB          | ~160 GB       | ~48 GB        |

## Training Patterns

### With Hugging Face Trainer

```python
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
from datasets import load_dataset

# Prepare dataset
dataset = load_dataset("tatsu-lab/alpaca", split="train")

def format_prompt(example):
    if example["input"]:
        text = f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:\n{example['output']}"
    else:
        text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['output']}"
    return {"text": text}

dataset = dataset.map(format_prompt)

def tokenize(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=512,
        padding=False,
    )

tokenized = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)

# Training arguments (note higher learning rate)
training_args = TrainingArguments(
    output_dir="./lora-output",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,              # Higher than full fine-tuning
    bf16=True,
    logging_steps=10,
    save_steps=500,
    warmup_ratio=0.03,
    gradient_checkpointing=True,
    optim="adamw_torch_fused",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

trainer.train()
```

### With SFTTrainer (TRL)

```python
from trl import SFTTrainer, SFTConfig

sft_config = SFTConfig(
    output_dir="./sft-lora",
    max_length=1024,
    dataset_text_

Files: 2

Size: 21.6 KB

Complexity: 35/100

Category: Productivity

Source: https://github.com/itsmostafa/llm-engineering-skills/tree/main/skills/lora

Related in Productivity

gitea-workflow

Included

Orchestrate agile development workflows for Gitea repositories using the tea CLI. Use when working with Gitea-hosted repos and asking to 'run the workflow', 'continue working', 'what's next', 'complete the task cycle', 'start my day', 'end the sprint', 'implement the next task', or wanting guided step-by-step development assistance. Keywords: workflow, orchestrate, agile, task cycle, sprint, daily, implement, review, PR, standup, retrospective, gitea, tea.

Productivityscripts

microsoft-graph-gateway

Included

Route Microsoft Graph work in this workspace. Use when users want to read or write Outlook mail, calendar events, contacts, OneDrive or SharePoint files, Teams, Planner, To Do, users, groups, directory data, or arbitrary Microsoft Graph endpoints from VS Code. Prefer WorkIQ for common read scenarios. Use Microsoft Graph for write actions and gap-read scenarios that need exact Graph properties, filters, permissions, or endpoints.

Productivityscripts

copilotkit

Included

Use when building with CopilotKit — setup, development, integrations, debugging, upgrading, or contributing. Routes to the appropriate specialized skill based on the task.

Productivityscripts

wordly-wisdom

Included

Provides calibrated decision analysis using Charlie Munger-style multiple mental models, inversion, incentive mapping, circle-of-competence checks, misjudgment audits, second-order effects, and forecast updates. Use when the user asks for an oracle take, a hard call, a decision memo, a premortem, an outside view, a red-team, a sanity-check, what am I missing, think this through, or wants a strategy, hire, investment, plan, product, partnership, or major life choice analysed. Avoid for simple factual lookups or time-sensitive legal, medical, or market questions without fresh evidence.

Productivityscripts

swain-session

Included

Session management and project status dashboard. Owns the full session lifecycle (start/work/close/resume), focus lane, bookmarks, worktree detection, and tab naming. Also serves as the project status dashboard — shows active epics, progress, actionable next steps, blocked items, tasks, GitHub issues, and recommendations. Worktree creation is deferred to swain-do task dispatch (SPEC-195). Triggers on: 'session', 'status', 'what's next', 'dashboard', 'overview', 'where are we', 'what should I work on', 'show me priorities', 'bookmark', 'focus on', 'session info'.

Productivityscripts

gandi

Included

Comprehensive Gandi domain registrar integration for domain and DNS management. Register and manage domains, create/update/delete DNS records (A, AAAA, CNAME, MX, TXT, SRV, and more), configure email forwarding and aliases, check SSL certificate status, create DNS snapshots for safe rollback, bulk update zone files, and monitor domain expiration. Supports multi-domain management, zone file import/export, and automated DNS backups. Includes both read-only and destructive operations with safety controls.

Productivityscripts

Use when building with CopilotKit — setup, development, integrations, debugging, upgrading, or contributing. Routes to the appropriate specialized skill based on the task.

Productivityscripts

wordly-wisdom

Included

Productivityscripts

swain-session

Included

Productivityscripts

gandi

Included

Productivityscripts