Overview & Setup

What are LLMs?

Large Language Models (LLMs) are text-in, text-out APIs. From an Express server's perspective they behave exactly like any other HTTP-based external service — you send a JSON payload and receive a JSON response. What makes them different from a typical REST API call is:

Latency — a single response can take 5–30 seconds for long outputs
Cost — you pay per token, not per request (input + output tokens are billed separately)
Statelessness — the model has zero memory between requests; you must re-send the full conversation history each time
Non-determinism — the same input can produce different outputs (configurable via temperature)

Understanding these four properties drives almost every architectural decision in this guide.

What is a token?

A token is the atomic unit LLMs process. It is roughly 3–4 characters of English text. The sentence "Hello, how are you?" is about 6 tokens. Models have a maximum context window measured in tokens — GPT-4o supports 128,000 tokens, Claude Sonnet supports 200,000 tokens. Input (your messages) and output (the model's reply) both consume from that budget.

The messages format

Both OpenAI and Anthropic use a messages array as the primary interface. Each message has a role and content:

const messages = [
  { role: 'system',    content: 'You are a helpful assistant.' },
  { role: 'user',      content: 'What is the capital of France?' },
  { role: 'assistant', content: 'The capital of France is Paris.' },
  { role: 'user',      content: 'What is its population?' },
];

Role	Purpose
`system`	Persistent instructions — persona, output format, constraints, tone
`user`	Messages from the human
`assistant`	Previous model replies (used to build conversation history)

info

Anthropic handles system differently

OpenAI accepts system as a message inside the array. Anthropic's API takes system as a top-level field separate from the messages array. Both SDKs handle this transparently, but you need to keep it in mind when building a provider-agnostic layer.

Supported Providers

Provider	npm package	Latest models
OpenAI	`openai`	`gpt-4o`, `gpt-4.1`, `o3`
Anthropic	`@anthropic-ai/sdk`	`claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`
Google	`@google/generative-ai`	`gemini-2.5-pro`, `gemini-2.5-flash`
Cohere	`cohere-ai`	`command-r-plus`

This guide focuses on OpenAI and Anthropic — they have the most mature Node.js SDKs and cover the vast majority of production use cases.

Project Setup

Install SDKs

npm install openai @anthropic-ai/sdk dotenv express-rate-limit

Enable ES modules

Add "type": "module" to your package.json to use import/export syntax throughout:

package.json
{
  "type": "module"
}

Environment variables

.env
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-api03-...

Validate keys at startup

Fail fast — crash immediately on startup if a key is missing instead of discovering it at runtime during a user request.

src/config/env.js
import 'dotenv/config';

function requireEnv(name) {
  const value = process.env[name];
  if (!value) throw new Error(`Missing required environment variable: ${name}`);
  return value;
}

export const OPENAI_API_KEY = requireEnv('OPENAI_API_KEY');
export const ANTHROPIC_API_KEY = requireEnv('ANTHROPIC_API_KEY');

src/app.js
import './config/env.js'; // crashes here if keys are missing

Never expose API keys to the client

These keys are billed per request. If a key leaks, an attacker can rack up thousands of dollars in API costs within hours. Keep them server-side only and rotate immediately if exposed.

Provider-agnostic service layer

When you want to swap providers based on config, cost, or availability, wrap both SDKs behind a single interface. Your routes call one function and never touch SDK specifics.

src/services/llm.service.js
import openai from '../lib/openai.js';
import anthropic from '../lib/anthropic.js';

const DEFAULTS = {
  openai:    { model: 'gpt-4o',           maxTokens: 1024 },
  anthropic: { model: 'claude-sonnet-4-6', maxTokens: 1024 },
};

export async function chat({ provider = 'openai', messages, systemPrompt = '', model, maxTokens, temperature }) {
  const defaults = DEFAULTS[provider];
  if (!defaults) throw new Error(`Unsupported provider: ${provider}`);

  if (provider === 'openai') {
    const fullMessages = systemPrompt
      ? [{ role: 'system', content: systemPrompt }, ...messages]
      : messages;

    const res = await openai.chat.completions.create({
      model: model || defaults.model,
      messages: fullMessages,
      max_tokens: maxTokens || defaults.maxTokens,
      temperature: temperature ?? 0.7,
    });

    return {
      content: res.choices[0].message.content,
      usage: { input: res.usage.prompt_tokens, output: res.usage.completion_tokens },
      provider: 'openai',
      model: res.model,
    };
  }

  if (provider === 'anthropic') {
    const res = await anthropic.messages.create({
      model: model || defaults.model,
      max_tokens: maxTokens || defaults.maxTokens,
      temperature: temperature ?? 1,
      system: systemPrompt,
      messages,
    });

    return {
      content: res.content[0].text,
      usage: { input: res.usage.input_tokens, output: res.usage.output_tokens },
      provider: 'anthropic',
      model: res.model,
    };
  }
}

What are LLMs?​

What is a token?​

The messages format​

Supported Providers​

Project Setup​

Install SDKs​

Enable ES modules​

Environment variables​

Validate keys at startup​

Provider-agnostic service layer​