Skip to main content

6 posts tagged with "AI"

View all tags

GPT-5.4 / GPT-5.4 mini / GPT-5.4 nano / GPT-4o / GPT-4o mini: Pricing and Performance Comparison

· 5 min read

This article compares the pricing, specs, and performance of OpenAI's current API models: GPT-5.4, GPT-5.4 nano, GPT-5.4 mini, GPT-4o, and GPT-4o mini, along with guidance on which model to choose for different use cases.

Unit: USD / 1M tokens (MTok). Information as of April 2026.

Pricing Comparison

ModelInputCached InputOutput
GPT-5.4$2.50$1.25$15.00
GPT-5.4 mini$0.75$0.075$4.50
GPT-5.4 nano$0.20$0.02$1.25
GPT-4o$2.50$1.25$10.00
GPT-4o mini$0.15$0.075$0.60

GPT-4o mini is the cheapest on both input and output, but its knowledge cutoff is October 2023, making it unsuitable for tasks requiring up-to-date information. GPT-5.4 nano has nearly the same input cost as GPT-4o mini, while offering GPT-5.4 family quality and knowledge up to August 2025. GPT-5.4 (flagship) matches GPT-4o on input cost but has a high output cost of $15.00/MTok, making it best suited for tasks that demand top-quality reasoning.

When using regional processing endpoints, a 10% surcharge applies to the GPT-5.4 series.

Specs Comparison

ModelContextMax OutputImage InputKnowledge Cutoff
GPT-5.4400K128KAugust 2025
GPT-5.4 mini400K128KAugust 2025
GPT-5.4 nano400K128KAugust 2025
GPT-4o128K16,384October 2023
GPT-4o mini128K16,384October 2023

The GPT-5.4 series dramatically expands the context window to 400K tokens and supports up to 128K tokens of output. GPT-4o and GPT-4o mini are capped at 128K context and 16K output.

Performance Comparison

GPT-5.4

The flagship model of the GPT-5.4 family. It represents the highest intelligence available from OpenAI in the current generation, significantly outperforming GPT-5.4 mini in complex reasoning, long-form generation, and advanced coding. It supports all native tools including computer use, MCP, and web search, with full multimodal input/output support. Given the high output cost of $15.00/MTok, it is most effective when reserved for tasks where top-quality output is essential.

GPT-5.4 mini

The mid-tier model of the GPT-5.4 family, optimized for coding, computer use, and sub-agent tasks. It consistently outperforms GPT-5 mini and achieves pass rates close to the flagship GPT-5.4 with faster processing. Benchmarks show a 2× or greater speed improvement over GPT-5 mini, offering the best performance/latency trade-off for coding workflows.

GPT-5.4 nano

The smallest and most affordable model in the GPT-5.4 family. Optimized for high-volume use cases where speed and cost are the top priorities — such as classification, data extraction, ranking, and coding sub-agents. Not suited for complex tasks requiring deep reasoning.

GPT-4o

The general-purpose flagship model with high intelligence for both text and image tasks. It is now a legacy model, superseded by the GPT-5.4 series. GPT-4o was retired from ChatGPT in February 2026, but API access remains available.

GPT-4o mini

Designed as a compact model ideal for fine-tuning. Achieves results comparable to larger models (GPT-4o) at lower cost and latency through distillation. MMLU score: 82.0%. Best suited for minimizing inference costs on simple tasks.

Which Model to Choose

  • High-volume / cost-first: GPT-5.4 nano or GPT-4o mini. Choose GPT-5.4 nano if up-to-date knowledge is required; GPT-4o mini if fine-tuning is needed.
  • Coding and agents: GPT-5.4 mini. The best balance of speed and accuracy.
  • Complex reasoning / high-quality output: GPT-5.4. High cost at $2.50 input / $15.00 output per MTok, but delivers the best output quality of the current generation.
  • Legacy system compatibility: GPT-4o. API access remains available, allowing existing integrations to continue.

Best Value Options

For cost-effectiveness, the two standout models are GPT-5.4 nano and GPT-5.4 mini.

GPT-5.4 nano has nearly the same input cost as GPT-4o mini ($0.20 vs $0.15), yet offers a 400K context window, knowledge up to August 2025, and full access to native tools such as web search, file search, and MCP. It surpasses GPT-4o mini in almost every dimension except knowledge cutoff, so switching to GPT-5.4 nano makes sense for any use case that doesn't require fine-tuning.

GPT-5.4 mini is cheaper on input ($0.75) than GPT-4o ($2.50/MTok) while outperforming GPT-4o in coding and agentic workflows. If you regularly use GPT-4o, switching to GPT-5.4 mini is likely to reduce costs while improving performance simultaneously.

On the other hand, GPT-4o now feels overpriced. Its input cost matches GPT-5.4 ($2.50/MTok), yet it falls behind in context size, knowledge recency, and tool support. Unless you specifically need fine-tuning or compatibility with existing systems, there is little reason to actively choose GPT-4o.

References

Inspecting AI Coding Tool Traffic with mitmproxy

· 3 min read

By setting up mitmproxy as a man-in-the-middle proxy, you can monitor the API traffic that AI coding tools make in real time.

How It Works

Many AI coding tools are Node.js applications that communicate with external APIs over HTTPS. By inserting mitmproxy as a man-in-the-middle proxy and configuring Node.js to trust the mitmproxy CA certificate, you can decrypt the encrypted traffic and inspect it in real time.

Installation

The easiest way to install mitmproxy is via uv.

uv tool install mitmproxy

Proxy Configuration

Set the following environment variables before launching the tool.

$env:HTTPS_PROXY = "http://127.0.0.1:8080"
$env:HTTP_PROXY = "http://127.0.0.1:8080"
$env:NODE_EXTRA_CA_CERTS = "$env:USERPROFILE\.mitmproxy\mitmproxy-ca-cert.pem"

About NODE_EXTRA_CA_CERTS

Setting only HTTPS_PROXY and HTTP_PROXY will cause Node.js TLS verification to fail. mitmproxy uses a self-signed certificate when relaying HTTPS traffic, which Node.js rejects by default.

By specifying the path to the mitmproxy CA certificate in NODE_EXTRA_CA_CERTS, Node.js will trust it and the connection will succeed.

Generating the CA Certificate

mitmproxy automatically generates a CA certificate on first launch and saves it to ~\.mitmproxy\. If you haven't generated it yet, simply start mitmproxy once.

mitmweb

A browser window opens automatically, showing the proxy management UI at http://127.0.0.1:8081.

Launching the Tool

Launch the tool in a separate terminal with the environment variables set.

Once you start using the tool, requests will appear in the mitmweb UI.

Captured Traffic

Endpoint

The tool sends requests to the following endpoint.

POST https://api.example.com/v1/messages

Request Headers

x-service-version: ...
content-type: application/json
x-api-key: sk-...

Request Body

{
"model": "model-name",
"max_tokens": 16000,
"stream": true,
"system": [
{
"type": "text",
"text": "..."
}
],
"messages": [
{
"role": "user",
"content": "..."
}
],
"tools": [
{
"name": "Read",
"description": "...",
"input_schema": {}
}
]
}

Streaming responses are received in Server-Sent Events (SSE) format with stream: true.

Response

data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
...
data: {"type":"message_stop"}

What You Can Learn

ItemDetails
API endpointapi.example.com/v1/messages
AuthenticationAPI key (x-api-key header)
StreamingSSE format
Tool definitionsIncluded in every request
System promptThousands to tens of thousands of tokens

The system prompt contains the AI coding tool's operating principles, descriptions of available tools, and usage guidelines.

Summary

  • The key is trusting the mitmproxy CA certificate in Node.js via NODE_EXTRA_CA_CERTS
  • API communication uses SSE streaming
  • You can inspect the internal structure of AI coding tools, including tool definitions and the system prompt

Calling the Vertex AI Gemini API from PowerShell

· 4 min read

This covers how to call Gemini models via Google Cloud's Vertex AI from PowerShell. Both the OpenAI-compatible endpoint and the native Gemini endpoint are explained.

Authentication

No API key required. Uses your existing Google Cloud credentials.

$accessToken = (gcloud auth print-access-token)

API key

$apiKey = $env:VERTEX_API_KEY

Endpoints

OpenAI-compatible endpoint (gcloud auth)

https://{region}-aiplatform.googleapis.com/v1beta1/projects/{projectId}/locations/{region}/endpoints/openapi/chat/completions

The request and response format is identical to the OpenAI API. The model name requires a google/ prefix (e.g., google/gemini-2.5-flash-lite).

Native Gemini endpoint (API key)

https://{region}-aiplatform.googleapis.com/v1/projects/{projectId}/locations/{region}/publishers/google/models/{model}:generateContent

For streaming, use :streamGenerateContent.

Basic calls

OpenAI-compatible (gcloud auth)

$projectId = "your-project-id"
$region = "us-central1"
$model = "google/gemini-2.5-flash-lite"
$accessToken = (gcloud auth print-access-token)

$body = @{
model = $model
messages = @(
@{
role = "user"
content = "What is the population of Tokyo?"
}
)
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1beta1/projects/$projectId/locations/$region/endpoints/openapi/chat/completions"

$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Headers @{ Authorization = "Bearer $accessToken" } `
-Body $body

$response.choices[0].message.content

Native Gemini (API key)

$projectId = "your-project-id"
$region = "us-central1"
$model = "gemini-2.5-flash-lite"
$apiKey = $env:VERTEX_API_KEY

$body = @{
contents = @(
@{
role = "user"
parts = @(
@{ text = "What is the population of Tokyo?" }
)
}
)
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1/projects/$projectId/locations/$region/publishers/google/models/${model}:generateContent?key=$apiKey"

$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Body $body

$response.candidates[0].content.parts[0].text

Response structure

OpenAI-compatible

$response.choices[0].message.content # generated text
$response.usage.total_tokens # total token count
$response.model # model used

Native Gemini

$response.candidates[0].content.parts[0].text # generated text
$response.usageMetadata.totalTokenCount # total token count
$response.modelVersion # model version used

For streaming (streamGenerateContent), an array of chunks is returned. Concatenate them to retrieve the full text.

$fullText = ($response | ForEach-Object {
$_.candidates[0].content.parts[0].text
}) -join ""

Adding a system prompt

OpenAI-compatible

$body = @{
model = $model
messages = @(
@{
role = "system"
content = "You are an AI assistant that responds in Japanese. Answer concisely."
}
@{
role = "user"
content = "What is the speed of light?"
}
)
} | ConvertTo-Json -Depth 10

Native Gemini

$body = @{
system_instruction = @{
parts = @(
@{ text = "You are an AI assistant that responds in Japanese. Answer concisely." }
)
}
contents = @(
@{
role = "user"
parts = @(@{ text = "What is the speed of light?" })
}
)
} | ConvertTo-Json -Depth 10

Multi-turn conversation

Place the conversation history in an array to achieve multi-turn conversation.

OpenAI-compatible

$body = @{
model = $model
messages = @(
@{ role = "user"; content = "Do you prefer cats or dogs?" }
@{ role = "assistant"; content = "I prefer cats." }
@{ role = "user"; content = "Why is that?" }
)
} | ConvertTo-Json -Depth 10

Native Gemini

The assistant role is specified as "model".

$body = @{
contents = @(
@{
role = "user"
parts = @(@{ text = "Do you prefer cats or dogs?" })
}
@{
role = "model"
parts = @(@{ text = "I prefer cats." })
}
@{
role = "user"
parts = @(@{ text = "Why is that?" })
}
)
} | ConvertTo-Json -Depth 10

Available models

ModelOpenAI-compatible nameDescription
gemini-2.5-flash-litegoogle/gemini-2.5-flash-liteLightweight, fast, low-cost
gemini-2.5-flashgoogle/gemini-2.5-flashBalanced
gemini-2.5-progoogle/gemini-2.5-proHigh-precision, for complex tasks

Which approach to use

SituationRecommended approach
GCP-authenticated environment (dev, CI, etc.)OpenAI-compatible + gcloud auth
Only an API key availableNative Gemini
Migrating from OpenAIOpenAI-compatible (minimizes code changes)
Streaming requiredNative Gemini

Notes

  • Do not hardcode the API key in scripts; load it from an environment variable ($env:VERTEX_API_KEY).
  • With gcloud auth, tokens expire in about 1 hour. Long-running scripts should refresh the token as needed.
  • Each project has its own rate limits and quotas. Check them before sending large numbers of requests.

Comparing Anthropic API and AWS Bedrock Pricing

· 3 min read

When using Claude via API, you have more than two options: in addition to calling the Anthropic API directly, you can also use it via AWS Bedrock, Google Vertex AI, or Microsoft Azure (Azure AI Foundry). Base pricing is the same across all routes, but there are differences in batch processing and cloud ecosystem integration.

Unit: USD / 1M tokens (MTok). Information as of March 2026.

On-Demand Base Pricing

ModelTypeAnthropic APIBedrockVertex AIAzure
Claude Opus 4.6Input$5.00$5.00$5.00$5.00
Output$25.00$25.00$25.00$25.00
Claude Sonnet 4.6Input$3.00$3.00$3.00$3.00
Output$15.00$15.00$15.00$15.00
Claude Haiku 4.5Input$1.00$1.00$1.00$1.00
Output$5.00$5.00$5.00$5.00
Claude Sonnet 4.5Input$3.00$3.00$3.00$3.00
Output$15.00$15.00$15.00$15.00

Base pricing is identical across all routes.

Note that Vertex AI regional endpoints carry a 10% surcharge over global endpoint pricing. Bedrock offers Long Context variants as separate SKUs at the same price; on the Anthropic API, Long Context is integrated into the standard models.

Cache Pricing

Prompt Caching rates are also identical across all routes.

ModelCache TypeAnthropic APIBedrockVertex AIAzure
Claude Opus 4.65-min cache write$6.25$6.25$6.25$6.25
1-hour cache write$10.00$10.00$10.00$10.00
Cache read$0.50$0.50$0.50$0.50
Claude Sonnet 4.65-min cache write$3.75$3.75$3.75$3.75
1-hour cache write$6.00$6.00$6.00$6.00
Cache read$0.30$0.30$0.30$0.30
Claude Haiku 4.55-min cache write$1.25$1.25$1.25$1.25
1-hour cache write$2.00$2.00$2.00$2.00
Cache read$0.10$0.10$0.10$0.10

Cache writes come in two TTL tiers: 5-minute (short-term) and 1-hour (long-term). Longer TTL means higher write cost, but for applications with lengthy system prompts that are read repeatedly, the savings on read pricing more than compensate.

Batch Processing Pricing

Bedrock, Vertex AI, and the Anthropic API all offer an asynchronous batch API at 50% off on-demand pricing. Azure does not explicitly list batch pricing at this time.

ModelBatch InputBatch Output
Claude Opus 4.6$2.50$12.50
Claude Sonnet 4.6$1.50$7.50
Claude Haiku 4.5$0.50$2.50
Claude Sonnet 4.5$1.50$7.50

For large-scale batch workloads (log analysis, embedding generation, etc.), any of these routes can cut costs in half.

Ecosystem Comparison

FeatureAnthropic APIBedrockVertex AIAzure
Base pricingSameSameSameSame
Regional surcharge+10% (regional)
Batch processing (50% off)Not listed
Tokyo region
IAM / audit log integrationAWSGoogle CloudAzure
VPC / PrivateLink
Billing integrationAnthropic directAWSGoogle CloudAzure
New feature rollout speedFastestDelayedDelayedDelayed

New features (such as Extended Thinking) roll out to the Anthropic API first; Vertex AI, Bedrock, and Azure typically follow weeks later.

Which Should You Choose?

  • Simple setup / prototyping: Anthropic API requires just one API key and gets new features first.
  • Deep AWS integration: If you need IAM, CloudWatch, or VPC, Bedrock is the natural choice. Tokyo region supported.
  • Deep Google Cloud integration: Vertex AI fits right in. Note the 10% surcharge on regional endpoints.
  • Deep Azure integration: Available via Azure AI Foundry, integrated with Azure billing and management.
  • Heavy batch workloads: Bedrock, Vertex AI, and the Anthropic API all offer 50% off batch pricing.

References

What Are AI Agent Skills? How They Work, Explained Simply

· 4 min read

Adding "skills" to an AI agent lets you extend its capabilities, just like installing a plugin for an app. This article explains how Agent Skills work and what an agent actually does internally when using them.

What Is an AI Agent?

First, an AI agent is an AI program that receives instructions and autonomously completes tasks.

Unlike a simple AI that just answers questions (like ChatGPT in basic use), an agent can:

  • Read and write files
  • Execute code and check results
  • Call external APIs and tools
  • Make decisions across multiple steps on its own

What Are Skills?

Agent Skills is a mechanism for giving agents new abilities and domain knowledge.

Think of it like handing a new employee a work manual. Once the agent reads the manual (the skill), it understands how to approach that task correctly.

Without skills: "Write a blog post" → Agent writes something generic
With skills: "Write a blog post" → Agent follows the manual and produces consistent, quality output

Skills are primarily written as Markdown files (SKILL.md) and can include:

  • Step-by-step procedures: What to do and in what order
  • Scripts: Automatable processes
  • Samples and config: Resources for the agent to reference

Why Are Skills Needed?

AI agents are extremely capable, but they don't know anything specific about your project.

For example:

  • "How does this team write commit messages?"
  • "What frontmatter format does this blog use?"
  • "Which commands are used for deployment?"

Without skills, agents can't know any of this. Skills let agents understand "the right way to do things" before acting.

How an Agent Processes a Skill

Let's look at what's happening inside the agent.

Here are the key points:

1. Loading the Skill

The agent reads the skill at the start. The skill content is passed as part of the LLM's input (prompt). The LLM reads this and understands "the right approach for this task."

2. Breaking Down the Task

Based on the instructions, the LLM breaks the task into smaller steps: "Read 3 existing posts first," "then decide on a filename," "then write the frontmatter," and so on.

3. Calling Tools

At each step, the agent calls tools as needed — reading files, searching the web, executing code — following the procedure defined by the skill.

4. Feeding Back Results

Tool results are passed back to the LLM. The LLM looks at the results and decides what to do next, looping until the task is complete.

Skill Commands

Skills can be invoked as slash commands (/command-name).

When a command is called, the corresponding Markdown file's content is expanded as a prompt, and the agent begins executing those steps.

Skills Are Growing

The Agent Skills format was developed and open-sourced by Anthropic and is now supported by many tools:

ToolSupported
Claude Code
GitHub Copilot
Cursor
Gemini CLI
OpenAI Codex
VS Code

The biggest advantage is that the same skill can be reused across different tools.

Summary

  • Skills are a mechanism for giving agents specialized knowledge and procedures
  • You can create one by writing steps and rules in a Markdown file (SKILL.md)
  • The agent receives the skill as a prompt; the LLM interprets it and executes each step
  • It's an open standard supported by Claude Code, Cursor, GitHub Copilot, and many more

With skills, you no longer have to explain the same things to your AI every time — agents can perform tasks with consistent quality, exactly the way you want.

Interacting with GPT-5 via API

· One min read

Since GPT-5 has been released, I tried hitting the API with PowerShell.

Code

$uri = "https://api.openai.com/v1/chat/completions"
$headers = @{
"Authorization" = "Bearer $env:OPENAI_API_KEY"
"Content-Type" = "application/json"
}

$body = @{
model = "gpt-5"
messages = @(
@{
role = "user"
content = "The total cost of a notebook and pencil is 100 yen. A pencil is 40 yen cheaper than a notebook. What is the price of a pencil?"
}
)
} | ConvertTo-Json -Depth 2

$response = Invoke-RestMethod -Uri $uri -Method Post -Headers $headers -Body $body

foreach($choice in $response.choices){
$choice.message.content
}

Output

30 yen

Reason:
- Let x be the price of a notebook and y be the price of a pencil.
- x + y = 100
- y = x - 40
- Substituting, 2x - 40 = 100 → x = 70 → y = 30
- Verification: 70 + 30 = 100, and a pencil is 40 yen cheaper than a notebook.