How to Use the OpenAI API: Complete Beginner Guide 2025

๐Ÿ“– 15 min read ยท AI & Machine Learning ยท Count Tokens โ†’

Getting Started

The OpenAI API lets you integrate GPT-4o, GPT-4o mini, DALL-E, Whisper, and other models directly into your applications. This guide covers everything from your first API call to production-ready patterns.

1Create an account at platform.openai.com
2Go to API Keys โ†’ Create new secret key
3Add billing information (pay-as-you-go)
4Install the SDK: npm install openai

Your First API Call

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY, // Never hardcode this!
});

const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',       // Cheapest capable model
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
  ],
  max_tokens: 100,
  temperature: 0.3,
});

console.log(response.choices[0].message.content);
// Output: "The capital of France is Paris."
โš ๏ธ Never expose your API key in client-side code. Always call the OpenAI API from your backend (Node.js server, API route, serverless function).

Understanding the Response Object

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"   // "stop" | "length" | "tool_calls"
  }],
  "usage": {
    "prompt_tokens": 24,      // Tokens in your input
    "completion_tokens": 9,   // Tokens in the response
    "total_tokens": 33        // Total billed tokens
  }
}

Streaming Responses

For better UX, stream the response token by token instead of waiting for the full completion:

const stream = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Write a haiku about coding.' }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content ?? '';
  process.stdout.write(delta); // Print each token as it arrives
}
// Output streams word by word in real-time

Function Calling (Tool Use)

Function calling lets the model trigger actions in your code โ€” search a database, call an API, run a calculation:

const tools = [{
  type: 'function',
  function: {
    name: 'get_weather',
    description: 'Get current weather for a city',
    parameters: {
      type: 'object',
      properties: {
        city: { type: 'string', description: 'City name' },
        unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
      },
      required: ['city'],
    },
  },
}];

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is the weather in Tokyo?' }],
  tools,
  tool_choice: 'auto',
});

// If model wants to call a function:
if (response.choices[0].finish_reason === 'tool_calls') {
  const toolCall = response.choices[0].message.tool_calls[0];
  const args = JSON.parse(toolCall.function.arguments);
  // args = { city: 'Tokyo', unit: 'celsius' }
  // Now call your actual weather API...
}

Embeddings API

Convert text to vector embeddings for semantic search, clustering, and RAG pipelines:

const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',  // $0.02 per 1M tokens
  input: 'The quick brown fox jumps over the lazy dog',
});

const vector = embedding.data[0].embedding;
// Returns array of 1536 numbers
// e.g. [0.0023, -0.0045, 0.0123, ...]

// Use cosine similarity to find similar texts:
function cosineSimilarity(a: number[], b: number[]) {
  const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
  const magB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
  return dot / (magA * magB);
}

Cost Management Tips

โœ“ Use gpt-4o-mini for most tasks

95% cheaper than GPT-4o, handles most tasks well. Only upgrade when you need complex reasoning.

โœ“ Set max_tokens limits

Always set max_tokens to prevent runaway costs from unexpectedly long responses.

โœ“ Cache common responses

If users ask the same questions, cache responses in Redis or a database. OpenAI also offers prompt caching for long system prompts.

โœ“ Count tokens before sending

Use tiktoken to count tokens before making API calls. Helps you stay within budget and context limits.

โœ“ Set spending limits

In the OpenAI dashboard, set monthly spending limits and usage alerts to avoid surprise bills.

Error Handling

import { OpenAI, APIError } from 'openai';

try {
  const response = await openai.chat.completions.create({ ... });
} catch (error) {
  if (error instanceof APIError) {
    switch (error.status) {
      case 401: console.error('Invalid API key'); break;
      case 429: console.error('Rate limit hit โ€” implement exponential backoff'); break;
      case 500: console.error('OpenAI server error โ€” retry after delay'); break;
      case 503: console.error('Service overloaded โ€” retry with backoff'); break;
    }
  }
}

Useful Tools for OpenAI Development

Count tokens before sending to the API, build structured prompts, and compare model capabilities.