Overview & Setup
What are LLMs?
Large Language Models (LLMs) are text-in, text-out APIs. From an Express server's perspective they behave exactly like any other HTTP-based external service — you send a JSON payload and receive a JSON response. What makes them different from a typical REST API call is:
- Latency — a single response can take 5–30 seconds for long outputs
- Cost — you pay per token, not per request (input + output tokens are billed separately)
- Statelessness — the model has zero memory between requests; you must re-send the full conversation history each time
- Non-determinism — the same input can produce different outputs (configurable via
temperature)
Understanding these four properties drives almost every architectural decision in this guide.
What is a token?
A token is the atomic unit LLMs process. It is roughly 3–4 characters of English text. The sentence "Hello, how are you?" is about 6 tokens. Models have a maximum context window measured in tokens — GPT-4o supports 128,000 tokens, Claude Sonnet supports 200,000 tokens. Input (your messages) and output (the model's reply) both consume from that budget.
The messages format
Both OpenAI and Anthropic use a messages array as the primary interface. Each message has a role and content:
const messages = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' },
{ role: 'assistant', content: 'The capital of France is Paris.' },
{ role: 'user', content: 'What is its population?' },
];
| Role | Purpose |
|---|---|
system | Persistent instructions — persona, output format, constraints, tone |
user | Messages from the human |
assistant | Previous model replies (used to build conversation history) |
system differentlyOpenAI accepts system as a message inside the array. Anthropic's API takes system as a top-level field separate from the messages array. Both SDKs handle this transparently, but you need to keep it in mind when building a provider-agnostic layer.
Supported Providers
| Provider | npm package | Latest models |
|---|---|---|
| OpenAI | openai | gpt-4o, gpt-4.1, o3 |
| Anthropic | @anthropic-ai/sdk | claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5 |
@google/generative-ai | gemini-2.5-pro, gemini-2.5-flash | |
| Cohere | cohere-ai | command-r-plus |
This guide focuses on OpenAI and Anthropic — they have the most mature Node.js SDKs and cover the vast majority of production use cases.
Project Setup
Install SDKs
npm install openai @anthropic-ai/sdk dotenv express-rate-limit
Enable ES modules
Add "type": "module" to your package.json to use import/export syntax throughout:
{
"type": "module"
}
Environment variables
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-api03-...
Validate keys at startup
Fail fast — crash immediately on startup if a key is missing instead of discovering it at runtime during a user request.
import 'dotenv/config';
function requireEnv(name) {
const value = process.env[name];
if (!value) throw new Error(`Missing required environment variable: ${name}`);
return value;
}
export const OPENAI_API_KEY = requireEnv('OPENAI_API_KEY');
export const ANTHROPIC_API_KEY = requireEnv('ANTHROPIC_API_KEY');
import './config/env.js'; // crashes here if keys are missing
These keys are billed per request. If a key leaks, an attacker can rack up thousands of dollars in API costs within hours. Keep them server-side only and rotate immediately if exposed.
Provider-agnostic service layer
When you want to swap providers based on config, cost, or availability, wrap both SDKs behind a single interface. Your routes call one function and never touch SDK specifics.
import openai from '../lib/openai.js';
import anthropic from '../lib/anthropic.js';
const DEFAULTS = {
openai: { model: 'gpt-4o', maxTokens: 1024 },
anthropic: { model: 'claude-sonnet-4-6', maxTokens: 1024 },
};
export async function chat({ provider = 'openai', messages, systemPrompt = '', model, maxTokens, temperature }) {
const defaults = DEFAULTS[provider];
if (!defaults) throw new Error(`Unsupported provider: ${provider}`);
if (provider === 'openai') {
const fullMessages = systemPrompt
? [{ role: 'system', content: systemPrompt }, ...messages]
: messages;
const res = await openai.chat.completions.create({
model: model || defaults.model,
messages: fullMessages,
max_tokens: maxTokens || defaults.maxTokens,
temperature: temperature ?? 0.7,
});
return {
content: res.choices[0].message.content,
usage: { input: res.usage.prompt_tokens, output: res.usage.completion_tokens },
provider: 'openai',
model: res.model,
};
}
if (provider === 'anthropic') {
const res = await anthropic.messages.create({
model: model || defaults.model,
max_tokens: maxTokens || defaults.maxTokens,
temperature: temperature ?? 1,
system: systemPrompt,
messages,
});
return {
content: res.content[0].text,
usage: { input: res.usage.input_tokens, output: res.usage.output_tokens },
provider: 'anthropic',
model: res.model,
};
}
}