Fine-Tuning LLMs: When to Fine-Tune vs Prompt Engineer
π 14 min read Β· AI & Machine Learning Β· Compare AI Models β
Fine-Tuning vs Prompt Engineering
Before spending time and money on fine-tuning, ask yourself: can prompt engineering solve this? In most cases, a well-crafted system prompt with few-shot examples achieves 80β90% of what fine-tuning would β at zero cost and with instant iteration.
| Approach | Cost | Speed to deploy | Best for |
|---|---|---|---|
| Prompt engineering | Free | Minutes | Most tasks, rapid iteration |
| Few-shot prompting | Token cost only | Minutes | Consistent output format |
| RAG | Embedding + storage | Hours | Knowledge-intensive tasks |
| Fine-tuning | $$$ | Days | Style, tone, specialized domain |
When Fine-Tuning Actually Makes Sense
You need the model to always write in your brand voice, use specific terminology, or follow a rigid format that's hard to enforce with prompts alone.
You have a 2,000-token system prompt that you send on every request. Fine-tuning bakes those instructions into the model, saving tokens and cost at scale.
Medical, legal, or highly technical domains where the base model lacks sufficient domain knowledge and you have proprietary training data.
Fine-tuned smaller models (e.g., GPT-4o mini) can match larger model quality for specific tasks at much lower latency and cost.
Your task involves patterns that don't exist in public training data β internal code conventions, company-specific workflows, proprietary formats.
OpenAI Fine-Tuning: Step by Step
1. Prepare Your Dataset
OpenAI requires JSONL format with at least 10 examples (50β100+ recommended):
// training_data.jsonl
{"messages": [
{"role": "system", "content": "You are a customer support agent for Acme Corp."},
{"role": "user", "content": "How do I reset my password?"},
{"role": "assistant", "content": "To reset your password, visit acme.com/reset and enter your email. You'll receive a link within 2 minutes. If you don't see it, check your spam folder."}
]}
{"messages": [
{"role": "system", "content": "You are a customer support agent for Acme Corp."},
{"role": "user", "content": "What are your business hours?"},
{"role": "assistant", "content": "Our support team is available MondayβFriday, 9 AMβ6 PM EST. For urgent issues outside these hours, email urgent@acme.com."}
]}2. Upload and Start Training
import OpenAI from 'openai';
import fs from 'fs';
const openai = new OpenAI();
// Upload training file
const file = await openai.files.create({
file: fs.createReadStream('training_data.jsonl'),
purpose: 'fine-tune',
});
// Start fine-tuning job
const job = await openai.fineTuning.jobs.create({
training_file: file.id,
model: 'gpt-4o-mini-2024-07-18',
hyperparameters: {
n_epochs: 3, // Number of training passes
},
});
console.log('Job ID:', job.id);
// Monitor at: platform.openai.com/finetune3. Use Your Fine-Tuned Model
const response = await openai.chat.completions.create({
model: 'ft:gpt-4o-mini-2024-07-18:my-org:my-model:abc123',
messages: [{ role: 'user', content: 'How do I cancel my subscription?' }],
});LoRA and QLoRA for Open-Source Models
For open-source models (Llama 3, Mistral, Phi-3), LoRA (Low-Rank Adaptation) is the standard fine-tuning technique. Instead of updating all model weights, it trains small adapter matrices β dramatically reducing memory and compute requirements.
Fine-Tuning Cost Estimates
| Model | Training cost | Inference cost |
|---|---|---|
| GPT-4o mini (fine-tuned) | $3.00 / 1M tokens | $0.30 input / $1.20 output per 1M |
| GPT-3.5 Turbo (fine-tuned) | $8.00 / 1M tokens | $0.30 input / $0.60 output per 1M |
| Llama 3 8B (QLoRA, cloud GPU) | ~$5β20 for 1K examples | Self-hosted or ~$0.10/1M tokens |
Related AI Tools
Count tokens in your training data, compare model capabilities, and build structured prompts.