
Large language models need domain-specific fine-tuning for production use. This guide shows efficient fine-tuning using parameter-efficient techniques.
Base models (GPT, LLaMA, etc.) are general-purpose. Fine-tuning adapts them to:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
torch_dtype=torch.bfloat16
)
# LoRA configuration (only train 0.1% of parameters!)
lora_config = LoraConfig(
r=16, # Low-rank dimension
lora_alpha=32, # Scaling factor
target_modules=["q_proj", "v_proj"], # Which layers to adapt
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
# Apply LoRA adapters
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 4.2M / 6.7B = 0.06%
Click to examine closelyKey benefit: Train only 4M parameters instead of 6.7B → 1600x fewer parameters, 10x faster, 4x less memory.

from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./lora-adapters",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
fp16=True, # Mixed precision
logging_steps=10,
save_strategy="epoch",
# ⚠️ Prevent catastrophic forgetting
warmup_steps=100,
weight_decay=0.01,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=domain_dataset,
eval_dataset=eval_dataset,
)
trainer.train()
Click to examine closely⚠️ Problem: Fine-tuning can erase base model's general knowledge.
Solutions:
# 1. Regularization (prevent weights from changing too much)
training_args = TrainingArguments(
weight_decay=0.01, # L2 regularization
warmup_ratio=0.1, # Gradual learning rate increase
)
# 2. Mixed batching (general + domain-specific data)
from torch.utils.data import ConcatDataset
mixed_dataset = ConcatDataset([
domain_specific_data, # Your new data
general_data.sample(frac=0.3) # 30% general data
])
# 3. Adapter layers (LoRA already helps with this)
# Only adapters are trained, base weights frozen
Click to examine closelyFor even lower memory (fine-tune 70B models on consumer GPU):
from transformers import BitsAndBytesConfig
# 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-70b-hf", # 70B model!
quantization_config=bnb_config,
device_map="auto"
)
# Apply LoRA on quantized model
model = get_peft_model(model, lora_config)
# Now can fine-tune 70B on 24GB GPU
Click to examine closely
accuracy = compute_domain_accuracy(model, eval_dataset)
from datasets import load_metric
def evaluate_model(model, eval_dataset):
"""Evaluate fine-tuned model."""
perplexity = trainer.evaluate()['eval_loss']
# Domain-specific metrics
accuracy = compute_domain_accuracy(model, eval_dataset)
# General knowledge retention
general_score = evaluate_on_benchmark(model, "mmlu")
return {
'perplexity': perplexity,
'domain_accuracy': accuracy,
'general_knowledge': general_score # Should stay high!
}
Click to examine closely# Save only LoRA adapters (tiny - 16MB vs 13GB base model)
model.save_pretrained("./my-domain-adapter")
# Load in production
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = PeftModel.from_pretrained(base_model, "./my-domain-adapter")
# Use normally
outputs = model.generate(**inputs)
Click to examine closely# Train multiple adapters for different tasks
adapters = {
'medical': train_lora(medical_data),
'legal': train_lora(legal_data),
'code': train_lora(code_data),
}
# Switch adapters at runtime
model.load_adapter("medical") # Medical mode
response = model.generate(medical_query)
model.load_adapter("code") # Code mode
code = model.generate(coding_query)
Click to examine closely
*Related Chronicles**: [Generative AI Monopoly (2053)](/articles/generative-ai-monopoly-2053)
Related Chronicles: Generative AI Monopoly (2053)
Code: Hugging Face PEFT