← Back to blog
OpenAI API in Production: Patterns That Actually Work
OpenAI API in Production: Patterns That Actually Work
Everyone is integrating OpenAI. Few are doing it in a way that holds up at scale. Here's what I've learned after shipping AI features into real production systems.
The Core Problem
LLMs are non-deterministic, slow, and expensive. Your architecture needs to account for all three.
Pattern 1: Always Stream
Never wait for a full completion. Users abandon requests after ~3 seconds.
// Stream from your Next.js API route
export async function POST(req: Request) {
const { prompt } = await req.json();
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }],
stream: true,
});
const encoder = new TextEncoder();
return new Response(
new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content ?? '';
if (text) controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
}
controller.enqueue(encoder.encode('data: [DONE]\n\n'));
controller.close();
},
}),
{ headers: { 'Content-Type': 'text/event-stream' } }
);
}
Pattern 2: Cache Aggressively
AI calls are expensive. Cache everything you can.
import { Redis } from '@upstash/redis';
const redis = new Redis({ url: process.env.UPSTASH_URL!, token: process.env.UPSTASH_TOKEN! });
async function cachedCompletion(prompt: string): Promise<string> {
const key = `ai:${Buffer.from(prompt).toString('base64').slice(0, 64)}`;
const cached = await redis.get<string>(key);
if (cached) return cached;
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: prompt }],
});
const result = response.choices[0].message.content ?? '';
await redis.setex(key, 3600, result); // cache for 1 hour
return result;
}
Pattern 3: Rate Limit Per User
async function checkRateLimit(userId: string): Promise<boolean> {
const key = `ratelimit:ai:${userId}`;
const requests = await redis.incr(key);
if (requests === 1) {
await redis.expire(key, 3600); // reset every hour
}
return requests <= 50; // 50 AI requests per hour
}
Pattern 4: Structured Outputs
Stop parsing free-form text. Use JSON mode.
const response = await openai.chat.completions.create({
model: 'gpt-4o',
response_format: { type: 'json_object' },
messages: [{
role: 'system',
content: 'Always respond with valid JSON matching the schema: { score: number, reason: string, tags: string[] }',
}, {
role: 'user',
content: userInput,
}],
});
const parsed = JSON.parse(response.choices[0].message.content!);
// TypeScript now knows the shape ✓
The Bottom Line
AI features are a multiplier on your product — but only if the plumbing is solid. Get streaming, caching, and rate limiting right before you ship anything to users.