Fine-Tuning LLMs: When to Fine-Tune vs Prompt Engineer

πŸ“– 14 min read Β· AI & Machine Learning Β· Compare AI Models β†’

Fine-Tuning vs Prompt Engineering

Before spending time and money on fine-tuning, ask yourself: can prompt engineering solve this? In most cases, a well-crafted system prompt with few-shot examples achieves 80–90% of what fine-tuning would β€” at zero cost and with instant iteration.

ApproachCostSpeed to deployBest for
Prompt engineeringFreeMinutesMost tasks, rapid iteration
Few-shot promptingToken cost onlyMinutesConsistent output format
RAGEmbedding + storageHoursKnowledge-intensive tasks
Fine-tuning$$$DaysStyle, tone, specialized domain

When Fine-Tuning Actually Makes Sense

βœ“ Consistent style/tone

You need the model to always write in your brand voice, use specific terminology, or follow a rigid format that's hard to enforce with prompts alone.

βœ“ Reduce prompt length

You have a 2,000-token system prompt that you send on every request. Fine-tuning bakes those instructions into the model, saving tokens and cost at scale.

βœ“ Specialized domain

Medical, legal, or highly technical domains where the base model lacks sufficient domain knowledge and you have proprietary training data.

βœ“ Latency-sensitive applications

Fine-tuned smaller models (e.g., GPT-4o mini) can match larger model quality for specific tasks at much lower latency and cost.

βœ“ Proprietary data patterns

Your task involves patterns that don't exist in public training data β€” internal code conventions, company-specific workflows, proprietary formats.

OpenAI Fine-Tuning: Step by Step

1. Prepare Your Dataset

OpenAI requires JSONL format with at least 10 examples (50–100+ recommended):

// training_data.jsonl
{"messages": [
  {"role": "system", "content": "You are a customer support agent for Acme Corp."},
  {"role": "user", "content": "How do I reset my password?"},
  {"role": "assistant", "content": "To reset your password, visit acme.com/reset and enter your email. You'll receive a link within 2 minutes. If you don't see it, check your spam folder."}
]}
{"messages": [
  {"role": "system", "content": "You are a customer support agent for Acme Corp."},
  {"role": "user", "content": "What are your business hours?"},
  {"role": "assistant", "content": "Our support team is available Monday–Friday, 9 AM–6 PM EST. For urgent issues outside these hours, email urgent@acme.com."}
]}

2. Upload and Start Training

import OpenAI from 'openai';
import fs from 'fs';

const openai = new OpenAI();

// Upload training file
const file = await openai.files.create({
  file: fs.createReadStream('training_data.jsonl'),
  purpose: 'fine-tune',
});

// Start fine-tuning job
const job = await openai.fineTuning.jobs.create({
  training_file: file.id,
  model: 'gpt-4o-mini-2024-07-18',
  hyperparameters: {
    n_epochs: 3,  // Number of training passes
  },
});

console.log('Job ID:', job.id);
// Monitor at: platform.openai.com/finetune

3. Use Your Fine-Tuned Model

const response = await openai.chat.completions.create({
  model: 'ft:gpt-4o-mini-2024-07-18:my-org:my-model:abc123',
  messages: [{ role: 'user', content: 'How do I cancel my subscription?' }],
});

LoRA and QLoRA for Open-Source Models

For open-source models (Llama 3, Mistral, Phi-3), LoRA (Low-Rank Adaptation) is the standard fine-tuning technique. Instead of updating all model weights, it trains small adapter matrices β€” dramatically reducing memory and compute requirements.

LoRA:Trains ~1% of parameters. Requires a GPU with 16–40GB VRAM for 7B models. Fast training, good results.
QLoRA:LoRA + 4-bit quantization. Can fine-tune 7B models on a single 16GB GPU (e.g., RTX 4080). Slight quality tradeoff.
Full fine-tuning:Updates all weights. Requires multiple high-end GPUs. Best quality but expensive. Usually not worth it vs LoRA.

Fine-Tuning Cost Estimates

ModelTraining costInference cost
GPT-4o mini (fine-tuned)$3.00 / 1M tokens$0.30 input / $1.20 output per 1M
GPT-3.5 Turbo (fine-tuned)$8.00 / 1M tokens$0.30 input / $0.60 output per 1M
Llama 3 8B (QLoRA, cloud GPU)~$5–20 for 1K examplesSelf-hosted or ~$0.10/1M tokens

Related AI Tools

Count tokens in your training data, compare model capabilities, and build structured prompts.