Conversation History & Sessions

Why History Must Be Re-sent

LLMs are stateless. Every request is independent — the model remembers nothing from previous calls. To maintain a conversation, you must send the complete message history on every request:

Turn 1: [ {user: "Hello"} ]
Turn 2: [ {user: "Hello"}, {assistant: "Hi!"}, {user: "What's 2+2?"} ]
Turn 3: [ {user: "Hello"}, {assistant: "Hi!"}, {user: "What's 2+2?"}, {assistant: "4"}, {user: "And 4+4?"} ]

In-memory Sessions (Development Only)

Simple in-memory session store — not for production
const sessions = new Map();

function getHistory(sessionId) {
  return sessions.get(sessionId) || [];
}

function appendHistory(sessionId, userMessage, assistantReply) {
  const history = getHistory(sessionId);
  history.push(
    { role: 'user',      content: userMessage },
    { role: 'assistant', content: assistantReply }
  );
  sessions.set(sessionId, history);
  return history;
}

This is fine for local development but lost on every restart. Never use in production.

Redis Sessions (Production)

In production, use Redis so history survives restarts and scales across multiple server instances.

npm install ioredis uuid

src/lib/redis.js
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL || 'redis://localhost:6379');

export default redis;

src/services/session.service.js
import redis from '../lib/redis.js';

const SESSION_TTL = 60 * 60 * 24; // 24 hours in seconds

export async function getHistory(sessionId) {
  const data = await redis.get(`chat:${sessionId}`);
  return data ? JSON.parse(data) : [];
}

export async function saveHistory(sessionId, messages) {
  await redis.setex(`chat:${sessionId}`, SESSION_TTL, JSON.stringify(messages));
}

export async function clearHistory(sessionId) {
  await redis.del(`chat:${sessionId}`);
}

src/routes/chat.route.js — with Redis sessions
import { v4 as uuidv4 } from 'uuid';
import { chat } from '../services/llm.service.js';
import { getHistory, saveHistory, clearHistory } from '../services/session.service.js';

// Start or continue a conversation
router.post('/', async (req, res, next) => {
  try {
    const { sessionId = uuidv4(), message, provider = 'openai' } = req.body;

    const history = await getHistory(sessionId);
    history.push({ role: 'user', content: message });

    const result = await chat({ provider, messages: history });

    history.push({ role: 'assistant', content: result.content });
    await saveHistory(sessionId, history);

    res.json({ sessionId, reply: result.content, usage: result.usage });
  } catch (err) {
    next(err);
  }
});

// Delete a session
router.delete('/:sessionId', async (req, res, next) => {
  try {
    await clearHistory(req.params.sessionId);
    res.json({ message: 'Session cleared' });
  } catch (err) {
    next(err);
  }
});

Use a TTL

Set a reasonable TTL on session keys (e.g. 24 hours). Chat history has no value after a session ends, and it will accumulate indefinitely without a TTL.

Context Window Management

Long conversations eventually exceed the model's context window. Trim old messages from the middle of the history when it grows too large, always keeping the system prompt and the most recent exchanges.

src/utils/trimHistory.js
function estimateTokens(text) {
  return Math.ceil(text.length / 4); // rough approximation: 1 token ≈ 4 chars
}

export function trimHistory(messages, maxTokens = 80000) {
  let totalTokens = messages.reduce((sum, m) => {
    const content = typeof m.content === 'string' ? m.content : JSON.stringify(m.content);
    return sum + estimateTokens(content);
  }, 0);

  // Remove second-oldest message (keep system message + recent turns)
  while (totalTokens > maxTokens && messages.length > 4) {
    const removed = messages.splice(1, 1)[0];
    const content = typeof removed.content === 'string' ? removed.content : JSON.stringify(removed.content);
    totalTokens -= estimateTokens(content);
  }

  return messages;
}

Usage in a route:

let history = await getHistory(sessionId);
history.push({ role: 'user', content: message });
history = trimHistory(history); // trim before sending
const result = await chat({ messages: history });

OpenAI Assistants API (Thread-Based History)

The Assistants API offloads history management to OpenAI. You create a Thread once and add messages to it — OpenAI maintains the history server-side. See the Assistants API page for a full implementation.

Approach	History lives	Best for
Manual (Redis)	Your server	Full control, any provider
Assistants API	OpenAI servers	Persistent assistants, file attachments

Why History Must Be Re-sent​

In-memory Sessions (Development Only)​

Redis Sessions (Production)​

Context Window Management​

OpenAI Assistants API (Thread-Based History)​

Why History Must Be Re-sent

In-memory Sessions (Development Only)

Redis Sessions (Production)

Context Window Management

OpenAI Assistants API (Thread-Based History)