Reading the docs and building a toy "ask a question, get an answer" demo takes about 20 minutes with the Claude API. Building something production-ready — with proper streaming, error handling, cost controls, and useful system prompts — takes a lot more thought.
This guide is about that second part. We'll cover the patterns that matter once you move past hello-world.
npm install @anthropic-ai/sdk// lib/claude.ts
import Anthropic from '@anthropic-ai/sdk'
export const claude = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
})Always initialize the client once and export it. Creating a new instance per request wastes resources and loses connection pooling benefits.
Nobody wants to wait 8 seconds for a response to appear all at once. Always stream for user-facing features:
// app/api/chat/route.ts (Next.js)
import { claude } from '@/lib/claude'
export async function POST(req: Request) {
const { messages, systemPrompt } = await req.json()
const stream = await claude.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: systemPrompt,
messages,
stream: true,
})
// Return a ReadableStream to the client
const readable = new ReadableStream({
async start(controller) {
for await (const event of stream) {
if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
controller.enqueue(new TextEncoder().encode(event.delta.text))
}
if (event.type === 'message_stop') {
controller.close()
}
}
},
})
return new Response(readable, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Transfer-Encoding': 'chunked',
},
})
}// components/ChatStream.tsx — consuming streamed response
'use client'
import { useState } from 'react'
export default function ChatStream() {
const [response, setResponse] = useState('')
const [loading, setLoading] = useState(false)
async function sendMessage(message: string) {
setLoading(true)
setResponse('')
const res = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [{ role: 'user', content: message }],
systemPrompt: 'You are a helpful coding assistant.',
}),
})
const reader = res.body!.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
setResponse(prev => prev + decoder.decode(value))
}
setLoading(false)
}
return (
<div>
<button onClick={() => sendMessage('Explain async/await in JavaScript')}>
Ask Claude
</button>
{loading && <p>Thinking...</p>}
<div>{response}</div>
</div>
)
}A weak system prompt produces generic responses. A strong one shapes every response in the conversation. Invest time here.
const CODE_REVIEW_SYSTEM_PROMPT = `You are a senior software engineer doing a thorough code review.
When reviewing code:
- Focus on correctness, security, and performance in that order
- Point out specific line numbers when referencing issues
- Explain WHY something is a problem, not just that it is one
- Suggest concrete improvements with example code
- Be direct but constructive — you're helping a colleague, not grading homework
Format your review as:
1. Summary (2-3 sentences)
2. Critical issues (must fix before merging)
3. Suggestions (nice to have)
4. What's done well
If there are no critical issues, say so clearly.`const DOCUMENTATION_SYSTEM_PROMPT = `You are a technical writer who specializes in developer documentation.
Rules:
- Write for developers, not managers
- Use active voice
- Include runnable code examples for every concept
- Don't explain what something is — explain when and why to use it
- Keep paragraphs to 3 sentences maximum
- Use second person ("you") not third person ("the developer")`Tool use (function calling) lets Claude decide to call a function when it needs external data:
import Anthropic from '@anthropic-ai/sdk'
const claude = new Anthropic()
const tools: Anthropic.Tool[] = [
{
name: 'get_post_by_slug',
description: 'Fetch a blog post by its URL slug. Use this when the user asks about a specific post.',
input_schema: {
type: 'object',
properties: {
slug: {
type: 'string',
description: 'The URL slug of the post, e.g. "nextjs-server-components"',
},
},
required: ['slug'],
},
},
{
name: 'search_posts',
description: 'Search blog posts by keyword. Use when the user asks to find posts about a topic.',
input_schema: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' },
limit: { type: 'number', description: 'Max results to return (default 5)' },
},
required: ['query'],
},
},
]
async function chatWithTools(userMessage: string) {
const messages: Anthropic.MessageParam[] = [
{ role: 'user', content: userMessage },
]
while (true) {
const response = await claude.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
tools,
messages,
})
if (response.stop_reason === 'end_turn') {
const textBlock = response.content.find(b => b.type === 'text')
return textBlock?.type === 'text' ? textBlock.text : ''
}
if (response.stop_reason === 'tool_use') {
// Add Claude's response (with tool calls) to history
messages.push({ role: 'assistant', content: response.content })
// Execute each tool call
const toolResults: Anthropic.ToolResultBlockParam[] = []
for (const block of response.content) {
if (block.type !== 'tool_use') continue
let result: string
if (block.name === 'get_post_by_slug') {
const post = await getPostBySlug((block.input as any).slug)
result = post ? JSON.stringify(post) : 'Post not found'
} else if (block.name === 'search_posts') {
const posts = await searchPosts((block.input as any).query)
result = JSON.stringify(posts)
} else {
result = 'Unknown tool'
}
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: result,
})
}
// Add tool results to history and continue
messages.push({ role: 'user', content: toolResults })
}
}
}If your system prompt is long (documentation, a large context document), prompt caching can reduce costs by up to 90% on repeated calls:
const response = await claude.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: [
{
type: 'text',
text: longSystemPrompt, // Your 2000-word system prompt
cache_control: { type: 'ephemeral' }, // Cache this!
},
],
messages: [{ role: 'user', content: userMessage }],
})The first call processes and caches the system prompt. Subsequent calls within the cache TTL (5 minutes) pay only the cache read price, which is ~10x cheaper than re-processing.
import Anthropic from '@anthropic-ai/sdk'
export async function callClaude(prompt: string, retries = 2): Promise<string> {
try {
const response = await claude.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
})
const text = response.content[0]
return text.type === 'text' ? text.text : ''
} catch (error) {
if (error instanceof Anthropic.RateLimitError && retries > 0) {
await new Promise(r => setTimeout(r, 2000))
return callClaude(prompt, retries - 1)
}
if (error instanceof Anthropic.APIStatusError) {
if (error.status >= 500 && retries > 0) {
await new Promise(r => setTimeout(r, 1000))
return callClaude(prompt, retries - 1)
}
throw new Error(`Claude API error: ${error.status} ${error.message}`)
}
throw error
}
}1. Not setting max_tokens. Without it you might hit the model's ceiling unexpectedly or burn through tokens on a runaway response.
2. Putting conversation history in the system prompt. System prompts are for instructions, not history. Use the messages array for conversation turns.
3. Not handling stop_reason. Always check why the model stopped — end_turn vs max_tokens vs tool_use require different handling.
4. Building without observability. Log every API call in production: model, tokens used, latency, stop reason. You'll need this for debugging and cost tracking.
5. Exposing your API key client-side. Always proxy Claude calls through your backend. Never call the Anthropic API directly from the browser.