Obtaining a List of Claude Models Using the Anthropic API
By using the Anthropic API's /v1/models endpoint, you can programmatically retrieve a list of available Claude models along with their specifications.
By using the Anthropic API's /v1/models endpoint, you can programmatically retrieve a list of available Claude models along with their specifications.
Claude Code is Anthropic's official coding agent, but by setting environment variables you can use other models such as DeepSeek V4 Pro via OpenRouter. This article explains how to configure that.
This article compares the pricing, specs, and performance of OpenAI's current API models: GPT-5.4, GPT-5.4 nano, GPT-5.4 mini, GPT-4o, and GPT-4o mini, along with guidance on which model to choose for different use cases.
Unit: USD / 1M tokens (MTok). Information as of April 2026.
| Model | Input | Cached Input | Output |
|---|---|---|---|
| GPT-5.4 | $2.50 | $1.25 | $15.00 |
| GPT-5.4 mini | $0.75 | $0.075 | $4.50 |
| GPT-5.4 nano | $0.20 | $0.02 | $1.25 |
| GPT-4o | $2.50 | $1.25 | $10.00 |
| GPT-4o mini | $0.15 | $0.075 | $0.60 |
GPT-4o mini is the cheapest on both input and output, but its knowledge cutoff is October 2023, making it unsuitable for tasks requiring up-to-date information. GPT-5.4 nano has nearly the same input cost as GPT-4o mini, while offering GPT-5.4 family quality and knowledge up to August 2025. GPT-5.4 (flagship) matches GPT-4o on input cost but has a high output cost of $15.00/MTok, making it best suited for tasks that demand top-quality reasoning.
When using regional processing endpoints, a 10% surcharge applies to the GPT-5.4 series.
| Model | Context | Max Output | Image Input | Knowledge Cutoff |
|---|---|---|---|---|
| GPT-5.4 | 400K | 128K | ✓ | August 2025 |
| GPT-5.4 mini | 400K | 128K | ✓ | August 2025 |
| GPT-5.4 nano | 400K | 128K | ✓ | August 2025 |
| GPT-4o | 128K | 16,384 | ✓ | October 2023 |
| GPT-4o mini | 128K | 16,384 | ✓ | October 2023 |
The GPT-5.4 series dramatically expands the context window to 400K tokens and supports up to 128K tokens of output. GPT-4o and GPT-4o mini are capped at 128K context and 16K output.
The flagship model of the GPT-5.4 family. It represents the highest intelligence available from OpenAI in the current generation, significantly outperforming GPT-5.4 mini in complex reasoning, long-form generation, and advanced coding. It supports all native tools including computer use, MCP, and web search, with full multimodal input/output support. Given the high output cost of $15.00/MTok, it is most effective when reserved for tasks where top-quality output is essential.
The mid-tier model of the GPT-5.4 family, optimized for coding, computer use, and sub-agent tasks. It consistently outperforms GPT-5 mini and achieves pass rates close to the flagship GPT-5.4 with faster processing. Benchmarks show a 2× or greater speed improvement over GPT-5 mini, offering the best performance/latency trade-off for coding workflows.
The smallest and most affordable model in the GPT-5.4 family. Optimized for high-volume use cases where speed and cost are the top priorities — such as classification, data extraction, ranking, and coding sub-agents. Not suited for complex tasks requiring deep reasoning.
The general-purpose flagship model with high intelligence for both text and image tasks. It is now a legacy model, superseded by the GPT-5.4 series. GPT-4o was retired from ChatGPT in February 2026, but API access remains available.
Designed as a compact model ideal for fine-tuning. Achieves results comparable to larger models (GPT-4o) at lower cost and latency through distillation. MMLU score: 82.0%. Best suited for minimizing inference costs on simple tasks.
For cost-effectiveness, the two standout models are GPT-5.4 nano and GPT-5.4 mini.
GPT-5.4 nano has nearly the same input cost as GPT-4o mini ($0.20 vs $0.15), yet offers a 400K context window, knowledge up to August 2025, and full access to native tools such as web search, file search, and MCP. It surpasses GPT-4o mini in almost every dimension except knowledge cutoff, so switching to GPT-5.4 nano makes sense for any use case that doesn't require fine-tuning.
GPT-5.4 mini is cheaper on input ($0.75) than GPT-4o ($2.50/MTok) while outperforming GPT-4o in coding and agentic workflows. If you regularly use GPT-4o, switching to GPT-5.4 mini is likely to reduce costs while improving performance simultaneously.
On the other hand, GPT-4o now feels overpriced. Its input cost matches GPT-5.4 ($2.50/MTok), yet it falls behind in context size, knowledge recency, and tool support. Unless you specifically need fine-tuning or compatibility with existing systems, there is little reason to actively choose GPT-4o.
By setting up mitmproxy as a man-in-the-middle proxy, you can monitor the API traffic that AI coding tools make in real time.
Many AI coding tools are Node.js applications that communicate with external APIs over HTTPS. By inserting mitmproxy as a man-in-the-middle proxy and configuring Node.js to trust the mitmproxy CA certificate, you can decrypt the encrypted traffic and inspect it in real time.
The easiest way to install mitmproxy is via uv.
uv tool install mitmproxy
Set the following environment variables before launching the tool.
$env:HTTPS_PROXY = "http://127.0.0.1:8080"
$env:HTTP_PROXY = "http://127.0.0.1:8080"
$env:NODE_EXTRA_CA_CERTS = "$env:USERPROFILE\.mitmproxy\mitmproxy-ca-cert.pem"
Setting only HTTPS_PROXY and HTTP_PROXY will cause Node.js TLS verification to fail. mitmproxy uses a self-signed certificate when relaying HTTPS traffic, which Node.js rejects by default.
By specifying the path to the mitmproxy CA certificate in NODE_EXTRA_CA_CERTS, Node.js will trust it and the connection will succeed.
mitmproxy automatically generates a CA certificate on first launch and saves it to ~\.mitmproxy\. If you haven't generated it yet, simply start mitmproxy once.
mitmweb
A browser window opens automatically, showing the proxy management UI at http://127.0.0.1:8081.
Launch the tool in a separate terminal with the environment variables set.
Once you start using the tool, requests will appear in the mitmweb UI.
The tool sends requests to the following endpoint.
POST https://api.example.com/v1/messages
x-service-version: ...
content-type: application/json
x-api-key: sk-...
{
"model": "model-name",
"max_tokens": 16000,
"stream": true,
"system": [
{
"type": "text",
"text": "..."
}
],
"messages": [
{
"role": "user",
"content": "..."
}
],
"tools": [
{
"name": "Read",
"description": "...",
"input_schema": {}
}
]
}
Streaming responses are received in Server-Sent Events (SSE) format with stream: true.
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
...
data: {"type":"message_stop"}
| Item | Details |
|---|---|
| API endpoint | api.example.com/v1/messages |
| Authentication | API key (x-api-key header) |
| Streaming | SSE format |
| Tool definitions | Included in every request |
| System prompt | Thousands to tens of thousands of tokens |
The system prompt contains the AI coding tool's operating principles, descriptions of available tools, and usage guidelines.
NODE_EXTRA_CA_CERTSThis covers how to call Gemini models via Google Cloud's Vertex AI from PowerShell. Both the OpenAI-compatible endpoint and the native Gemini endpoint are explained.
No API key required. Uses your existing Google Cloud credentials.
$accessToken = (gcloud auth print-access-token)
$apiKey = $env:VERTEX_API_KEY
https://{region}-aiplatform.googleapis.com/v1beta1/projects/{projectId}/locations/{region}/endpoints/openapi/chat/completions
The request and response format is identical to the OpenAI API. The model name requires a google/ prefix (e.g., google/gemini-2.5-flash-lite).
https://{region}-aiplatform.googleapis.com/v1/projects/{projectId}/locations/{region}/publishers/google/models/{model}:generateContent
For streaming, use :streamGenerateContent.
$projectId = "your-project-id"
$region = "us-central1"
$model = "google/gemini-2.5-flash-lite"
$accessToken = (gcloud auth print-access-token)
$body = @{
model = $model
messages = @(
@{
role = "user"
content = "What is the population of Tokyo?"
}
)
} | ConvertTo-Json -Depth 10
$uri = "https://$region-aiplatform.googleapis.com/v1beta1/projects/$projectId/locations/$region/endpoints/openapi/chat/completions"
$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Headers @{ Authorization = "Bearer $accessToken" } `
-Body $body
$response.choices[0].message.content
$projectId = "your-project-id"
$region = "us-central1"
$model = "gemini-2.5-flash-lite"
$apiKey = $env:VERTEX_API_KEY
$body = @{
contents = @(
@{
role = "user"
parts = @(
@{ text = "What is the population of Tokyo?" }
)
}
)
} | ConvertTo-Json -Depth 10
$uri = "https://$region-aiplatform.googleapis.com/v1/projects/$projectId/locations/$region/publishers/google/models/${model}:generateContent?key=$apiKey"
$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Body $body
$response.candidates[0].content.parts[0].text
$response.choices[0].message.content # generated text
$response.usage.total_tokens # total token count
$response.model # model used
$response.candidates[0].content.parts[0].text # generated text
$response.usageMetadata.totalTokenCount # total token count
$response.modelVersion # model version used
For streaming (streamGenerateContent), an array of chunks is returned. Concatenate them to retrieve the full text.
$fullText = ($response | ForEach-Object {
$_.candidates[0].content.parts[0].text
}) -join ""
$body = @{
model = $model
messages = @(
@{
role = "system"
content = "You are an AI assistant that responds in Japanese. Answer concisely."
}
@{
role = "user"
content = "What is the speed of light?"
}
)
} | ConvertTo-Json -Depth 10
$body = @{
system_instruction = @{
parts = @(
@{ text = "You are an AI assistant that responds in Japanese. Answer concisely." }
)
}
contents = @(
@{
role = "user"
parts = @(@{ text = "What is the speed of light?" })
}
)
} | ConvertTo-Json -Depth 10
Place the conversation history in an array to achieve multi-turn conversation.
$body = @{
model = $model
messages = @(
@{ role = "user"; content = "Do you prefer cats or dogs?" }
@{ role = "assistant"; content = "I prefer cats." }
@{ role = "user"; content = "Why is that?" }
)
} | ConvertTo-Json -Depth 10
The assistant role is specified as "model".
$body = @{
contents = @(
@{
role = "user"
parts = @(@{ text = "Do you prefer cats or dogs?" })
}
@{
role = "model"
parts = @(@{ text = "I prefer cats." })
}
@{
role = "user"
parts = @(@{ text = "Why is that?" })
}
)
} | ConvertTo-Json -Depth 10
| Model | OpenAI-compatible name | Description |
|---|---|---|
gemini-2.5-flash-lite | google/gemini-2.5-flash-lite | Lightweight, fast, low-cost |
gemini-2.5-flash | google/gemini-2.5-flash | Balanced |
gemini-2.5-pro | google/gemini-2.5-pro | High-precision, for complex tasks |
| Situation | Recommended approach |
|---|---|
| GCP-authenticated environment (dev, CI, etc.) | OpenAI-compatible + gcloud auth |
| Only an API key available | Native Gemini |
| Migrating from OpenAI | OpenAI-compatible (minimizes code changes) |
| Streaming required | Native Gemini |
$env:VERTEX_API_KEY).When using Claude via API, you have more than two options: in addition to calling the Anthropic API directly, you can also use it via AWS Bedrock, Google Vertex AI, or Microsoft Azure (Azure AI Foundry). Base pricing is the same across all routes, but there are differences in batch processing and cloud ecosystem integration.
Unit: USD / 1M tokens (MTok). Information as of March 2026.
| Model | Type | Anthropic API | Bedrock | Vertex AI | Azure |
|---|---|---|---|---|---|
| Claude Opus 4.6 | Input | $5.00 | $5.00 | $5.00 | $5.00 |
| Output | $25.00 | $25.00 | $25.00 | $25.00 | |
| Claude Sonnet 4.6 | Input | $3.00 | $3.00 | $3.00 | $3.00 |
| Output | $15.00 | $15.00 | $15.00 | $15.00 | |
| Claude Haiku 4.5 | Input | $1.00 | $1.00 | $1.00 | $1.00 |
| Output | $5.00 | $5.00 | $5.00 | $5.00 | |
| Claude Sonnet 4.5 | Input | $3.00 | $3.00 | $3.00 | $3.00 |
| Output | $15.00 | $15.00 | $15.00 | $15.00 |
Base pricing is identical across all routes.
Note that Vertex AI regional endpoints carry a 10% surcharge over global endpoint pricing. Bedrock offers Long Context variants as separate SKUs at the same price; on the Anthropic API, Long Context is integrated into the standard models.
Prompt Caching rates are also identical across all routes.
| Model | Cache Type | Anthropic API | Bedrock | Vertex AI | Azure |
|---|---|---|---|---|---|
| Claude Opus 4.6 | 5-min cache write | $6.25 | $6.25 | $6.25 | $6.25 |
| 1-hour cache write | $10.00 | $10.00 | $10.00 | $10.00 | |
| Cache read | $0.50 | $0.50 | $0.50 | $0.50 | |
| Claude Sonnet 4.6 | 5-min cache write | $3.75 | $3.75 | $3.75 | $3.75 |
| 1-hour cache write | $6.00 | $6.00 | $6.00 | $6.00 | |
| Cache read | $0.30 | $0.30 | $0.30 | $0.30 | |
| Claude Haiku 4.5 | 5-min cache write | $1.25 | $1.25 | $1.25 | $1.25 |
| 1-hour cache write | $2.00 | $2.00 | $2.00 | $2.00 | |
| Cache read | $0.10 | $0.10 | $0.10 | $0.10 |
Cache writes come in two TTL tiers: 5-minute (short-term) and 1-hour (long-term). Longer TTL means higher write cost, but for applications with lengthy system prompts that are read repeatedly, the savings on read pricing more than compensate.
Bedrock, Vertex AI, and the Anthropic API all offer an asynchronous batch API at 50% off on-demand pricing. Azure does not explicitly list batch pricing at this time.
| Model | Batch Input | Batch Output |
|---|---|---|
| Claude Opus 4.6 | $2.50 | $12.50 |
| Claude Sonnet 4.6 | $1.50 | $7.50 |
| Claude Haiku 4.5 | $0.50 | $2.50 |
| Claude Sonnet 4.5 | $1.50 | $7.50 |
For large-scale batch workloads (log analysis, embedding generation, etc.), any of these routes can cut costs in half.
| Feature | Anthropic API | Bedrock | Vertex AI | Azure |
|---|---|---|---|---|
| Base pricing | Same | Same | Same | Same |
| Regional surcharge | — | — | +10% (regional) | — |
| Batch processing (50% off) | ○ | ○ | ○ | Not listed |
| Tokyo region | — | ○ | ○ | — |
| IAM / audit log integration | — | AWS | Google Cloud | Azure |
| VPC / PrivateLink | — | ○ | ○ | ○ |
| Billing integration | Anthropic direct | AWS | Google Cloud | Azure |
| New feature rollout speed | Fastest | Delayed | Delayed | Delayed |
New features (such as Extended Thinking) roll out to the Anthropic API first; Vertex AI, Bedrock, and Azure typically follow weeks later.
Adding "skills" to an AI agent lets you extend its capabilities, just like installing a plugin for an app. This article explains how Agent Skills work and what an agent actually does internally when using them.
First, an AI agent is an AI program that receives instructions and autonomously completes tasks.
Unlike a simple AI that just answers questions (like ChatGPT in basic use), an agent can:
Agent Skills is a mechanism for giving agents new abilities and domain knowledge.
Think of it like handing a new employee a work manual. Once the agent reads the manual (the skill), it understands how to approach that task correctly.
Without skills: "Write a blog post" → Agent writes something generic
With skills: "Write a blog post" → Agent follows the manual and produces consistent, quality output
Skills are primarily written as Markdown files (SKILL.md) and can include:
AI agents are extremely capable, but they don't know anything specific about your project.
For example:
Without skills, agents can't know any of this. Skills let agents understand "the right way to do things" before acting.
Let's look at what's happening inside the agent.
Here are the key points:
The agent reads the skill at the start. The skill content is passed as part of the LLM's input (prompt). The LLM reads this and understands "the right approach for this task."
Based on the instructions, the LLM breaks the task into smaller steps: "Read 3 existing posts first," "then decide on a filename," "then write the frontmatter," and so on.
At each step, the agent calls tools as needed — reading files, searching the web, executing code — following the procedure defined by the skill.
Tool results are passed back to the LLM. The LLM looks at the results and decides what to do next, looping until the task is complete.
Skills can be invoked as slash commands (/command-name).
When a command is called, the corresponding Markdown file's content is expanded as a prompt, and the agent begins executing those steps.
The Agent Skills format was developed and open-sourced by Anthropic and is now supported by many tools:
| Tool | Supported |
|---|---|
| Claude Code | ✅ |
| GitHub Copilot | ✅ |
| Cursor | ✅ |
| Gemini CLI | ✅ |
| OpenAI Codex | ✅ |
| VS Code | ✅ |
The biggest advantage is that the same skill can be reused across different tools.
SKILL.md)With skills, you no longer have to explain the same things to your AI every time — agents can perform tasks with consistent quality, exactly the way you want.
Since GPT-5 has been released, I tried hitting the API with PowerShell.
$uri = "https://api.openai.com/v1/chat/completions"
$headers = @{
"Authorization" = "Bearer $env:OPENAI_API_KEY"
"Content-Type" = "application/json"
}
$body = @{
model = "gpt-5"
messages = @(
@{
role = "user"
content = "The total cost of a notebook and pencil is 100 yen. A pencil is 40 yen cheaper than a notebook. What is the price of a pencil?"
}
)
} | ConvertTo-Json -Depth 2
$response = Invoke-RestMethod -Uri $uri -Method Post -Headers $headers -Body $body
foreach($choice in $response.choices){
$choice.message.content
}
30 yen
Reason:
- Let x be the price of a notebook and y be the price of a pencil.
- x + y = 100
- y = x - 40
- Substituting, 2x - 40 = 100 → x = 70 → y = 30
- Verification: 70 + 30 = 100, and a pencil is 40 yen cheaper than a notebook.