Fine-Tuning LLMs: When to Fine-Tune vs Prompt Engineer

📖 14 min read · AI & Machine Learning · Compare AI Models →

Fine-Tuning vs Prompt Engineering

Before spending time and money on fine-tuning, ask yourself: can prompt engineering solve this? In most cases, a well-crafted system prompt with few-shot examples achieves 80–90% of what fine-tuning would — at zero cost and with instant iteration.

Approach	Cost	Speed to deploy	Best for
Prompt engineering	Free	Minutes	Most tasks, rapid iteration
Few-shot prompting	Token cost only	Minutes	Consistent output format
RAG	Embedding + storage	Hours	Knowledge-intensive tasks
Fine-tuning	$$$	Days	Style, tone, specialized domain

When Fine-Tuning Actually Makes Sense

✓ Consistent style/tone

You need the model to always write in your brand voice, use specific terminology, or follow a rigid format that's hard to enforce with prompts alone.

✓ Reduce prompt length

You have a 2,000-token system prompt that you send on every request. Fine-tuning bakes those instructions into the model, saving tokens and cost at scale.

✓ Specialized domain

Medical, legal, or highly technical domains where the base model lacks sufficient domain knowledge and you have proprietary training data.

✓ Latency-sensitive applications

Fine-tuned smaller models (e.g., GPT-4o mini) can match larger model quality for specific tasks at much lower latency and cost.

✓ Proprietary data patterns

Your task involves patterns that don't exist in public training data — internal code conventions, company-specific workflows, proprietary formats.

OpenAI Fine-Tuning: Step by Step

1. Prepare Your Dataset

OpenAI requires JSONL format with at least 10 examples (50–100+ recommended):

// training_data.jsonl
{"messages": [
  {"role": "system", "content": "You are a customer support agent for Acme Corp."},
  {"role": "user", "content": "How do I reset my password?"},
  {"role": "assistant", "content": "To reset your password, visit acme.com/reset and enter your email. You'll receive a link within 2 minutes. If you don't see it, check your spam folder."}
]}
{"messages": [
  {"role": "system", "content": "You are a customer support agent for Acme Corp."},
  {"role": "user", "content": "What are your business hours?"},
  {"role": "assistant", "content": "Our support team is available Monday–Friday, 9 AM–6 PM EST. For urgent issues outside these hours, email urgent@acme.com."}
]}

2. Upload and Start Training

import OpenAI from 'openai';
import fs from 'fs';

const openai = new OpenAI();

// Upload training file
const file = await openai.files.create({
  file: fs.createReadStream('training_data.jsonl'),
  purpose: 'fine-tune',
});

// Start fine-tuning job
const job = await openai.fineTuning.jobs.create({
  training_file: file.id,
  model: 'gpt-4o-mini-2024-07-18',
  hyperparameters: {
    n_epochs: 3,  // Number of training passes
  },
});

console.log('Job ID:', job.id);
// Monitor at: platform.openai.com/finetune

3. Use Your Fine-Tuned Model

const response = await openai.chat.completions.create({
  model: 'ft:gpt-4o-mini-2024-07-18:my-org:my-model:abc123',
  messages: [{ role: 'user', content: 'How do I cancel my subscription?' }],
});

LoRA and QLoRA for Open-Source Models

For open-source models (Llama 3, Mistral, Phi-3), LoRA (Low-Rank Adaptation) is the standard fine-tuning technique. Instead of updating all model weights, it trains small adapter matrices — dramatically reducing memory and compute requirements.

LoRA:Trains ~1% of parameters. Requires a GPU with 16–40GB VRAM for 7B models. Fast training, good results.

QLoRA:LoRA + 4-bit quantization. Can fine-tune 7B models on a single 16GB GPU (e.g., RTX 4080). Slight quality tradeoff.

Full fine-tuning:Updates all weights. Requires multiple high-end GPUs. Best quality but expensive. Usually not worth it vs LoRA.

Fine-Tuning Cost Estimates

Model	Training cost	Inference cost
GPT-4o mini (fine-tuned)	$3.00 / 1M tokens	$0.30 input / $1.20 output per 1M
GPT-3.5 Turbo (fine-tuned)	$8.00 / 1M tokens	$0.30 input / $0.60 output per 1M
Llama 3 8B (QLoRA, cloud GPU)	~$5–20 for 1K examples	Self-hosted or ~$0.10/1M tokens

Related AI Tools

Count tokens in your training data, compare model capabilities, and build structured prompts.

Token Counter →Compare AI Models →JSON Formatter →

DevBench Editorial Team

Software Developers & Technical Writers

The DevBench team builds and maintains 90+ free developer tools used by thousands of developers daily. We write practical, no-fluff guides covering web development, APIs, security, data formats, and AI tools.

About DevBench →More Articles →