How to Use the OpenAI API: Complete Beginner Guide 2025

📖 15 min read · AI & Machine Learning · Count Tokens →

Getting Started

The OpenAI API lets you integrate GPT-4o, GPT-4o mini, DALL-E, Whisper, and other models directly into your applications. This guide covers everything from your first API call to production-ready patterns.

1Create an account at platform.openai.com

2Go to API Keys → Create new secret key

3Add billing information (pay-as-you-go)

4Install the SDK: npm install openai

Your First API Call

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY, // Never hardcode this!
});

const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',       // Cheapest capable model
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
  ],
  max_tokens: 100,
  temperature: 0.3,
});

console.log(response.choices[0].message.content);
// Output: "The capital of France is Paris."

⚠️ Never expose your API key in client-side code. Always call the OpenAI API from your backend (Node.js server, API route, serverless function).

Understanding the Response Object

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"   // "stop" | "length" | "tool_calls"
  }],
  "usage": {
    "prompt_tokens": 24,      // Tokens in your input
    "completion_tokens": 9,   // Tokens in the response
    "total_tokens": 33        // Total billed tokens
  }
}

Streaming Responses

For better UX, stream the response token by token instead of waiting for the full completion:

const stream = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Write a haiku about coding.' }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content ?? '';
  process.stdout.write(delta); // Print each token as it arrives
}
// Output streams word by word in real-time

Function Calling (Tool Use)

Function calling lets the model trigger actions in your code — search a database, call an API, run a calculation:

const tools = [{
  type: 'function',
  function: {
    name: 'get_weather',
    description: 'Get current weather for a city',
    parameters: {
      type: 'object',
      properties: {
        city: { type: 'string', description: 'City name' },
        unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
      },
      required: ['city'],
    },
  },
}];

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is the weather in Tokyo?' }],
  tools,
  tool_choice: 'auto',
});

// If model wants to call a function:
if (response.choices[0].finish_reason === 'tool_calls') {
  const toolCall = response.choices[0].message.tool_calls[0];
  const args = JSON.parse(toolCall.function.arguments);
  // args = { city: 'Tokyo', unit: 'celsius' }
  // Now call your actual weather API...
}

Embeddings API

Convert text to vector embeddings for semantic search, clustering, and RAG pipelines:

const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',  // $0.02 per 1M tokens
  input: 'The quick brown fox jumps over the lazy dog',
});

const vector = embedding.data[0].embedding;
// Returns array of 1536 numbers
// e.g. [0.0023, -0.0045, 0.0123, ...]

// Use cosine similarity to find similar texts:
function cosineSimilarity(a: number[], b: number[]) {
  const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
  const magB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
  return dot / (magA * magB);
}

Cost Management Tips

✓ Use gpt-4o-mini for most tasks

95% cheaper than GPT-4o, handles most tasks well. Only upgrade when you need complex reasoning.

✓ Set max_tokens limits

Always set max_tokens to prevent runaway costs from unexpectedly long responses.

✓ Cache common responses

If users ask the same questions, cache responses in Redis or a database. OpenAI also offers prompt caching for long system prompts.

✓ Count tokens before sending

Use tiktoken to count tokens before making API calls. Helps you stay within budget and context limits.

✓ Set spending limits

In the OpenAI dashboard, set monthly spending limits and usage alerts to avoid surprise bills.

Error Handling

import { OpenAI, APIError } from 'openai';

try {
  const response = await openai.chat.completions.create({ ... });
} catch (error) {
  if (error instanceof APIError) {
    switch (error.status) {
      case 401: console.error('Invalid API key'); break;
      case 429: console.error('Rate limit hit — implement exponential backoff'); break;
      case 500: console.error('OpenAI server error — retry after delay'); break;
      case 503: console.error('Service overloaded — retry with backoff'); break;
    }
  }
}

Useful Tools for OpenAI Development

Count tokens before sending to the API, build structured prompts, and compare model capabilities.

Token Counter →AI Prompt Builder →JSON Formatter →

DevBench Editorial Team

Software Developers & Technical Writers

The DevBench team builds and maintains 90+ free developer tools used by thousands of developers daily. We write practical, no-fluff guides covering web development, APIs, security, data formats, and AI tools.

About DevBench →More Articles →