8 posts tagged with "AI"

Obtaining a List of Claude Models Using the Anthropic API

May 22, 2026

By using the Anthropic API's /v1/models endpoint, you can programmatically retrieve a list of available Claude models along with their specifications.

How to Use DeepSeek V4 Pro via OpenRouter with Claude Code

May 17, 2026

ひかり

Claude Code is Anthropic's official coding agent, but by setting environment variables you can use other models such as DeepSeek V4 Pro via OpenRouter. This article explains how to configure that.

GPT-5.4 / GPT-5.4 mini / GPT-5.4 nano / GPT-4o / GPT-4o mini: Pricing and Performance Comparison

April 18, 2026

This article compares the pricing, specs, and performance of OpenAI's current API models: GPT-5.4, GPT-5.4 nano, GPT-5.4 mini, GPT-4o, and GPT-4o mini, along with guidance on which model to choose for different use cases.

Unit: USD / 1M tokens (MTok). Information as of April 2026.

Pricing Comparison

Model	Input	Cached Input	Output
GPT-5.4	$2.50	$1.25	$15.00
GPT-5.4 mini	$0.75	$0.075	$4.50
GPT-5.4 nano	$0.20	$0.02	$1.25
GPT-4o	$2.50	$1.25	$10.00
GPT-4o mini	$0.15	$0.075	$0.60

GPT-4o mini is the cheapest on both input and output, but its knowledge cutoff is October 2023, making it unsuitable for tasks requiring up-to-date information. GPT-5.4 nano has nearly the same input cost as GPT-4o mini, while offering GPT-5.4 family quality and knowledge up to August 2025. GPT-5.4 (flagship) matches GPT-4o on input cost but has a high output cost of $15.00/MTok, making it best suited for tasks that demand top-quality reasoning.

When using regional processing endpoints, a 10% surcharge applies to the GPT-5.4 series.

Specs Comparison

Model	Context	Max Output	Image Input	Knowledge Cutoff
GPT-5.4	400K	128K	✓	August 2025
GPT-5.4 mini	400K	128K	✓	August 2025
GPT-5.4 nano	400K	128K	✓	August 2025
GPT-4o	128K	16,384	✓	October 2023
GPT-4o mini	128K	16,384	✓	October 2023

The GPT-5.4 series dramatically expands the context window to 400K tokens and supports up to 128K tokens of output. GPT-4o and GPT-4o mini are capped at 128K context and 16K output.

Performance Comparison

GPT-5.4

The flagship model of the GPT-5.4 family. It represents the highest intelligence available from OpenAI in the current generation, significantly outperforming GPT-5.4 mini in complex reasoning, long-form generation, and advanced coding. It supports all native tools including computer use, MCP, and web search, with full multimodal input/output support. Given the high output cost of $15.00/MTok, it is most effective when reserved for tasks where top-quality output is essential.

GPT-5.4 mini

The mid-tier model of the GPT-5.4 family, optimized for coding, computer use, and sub-agent tasks. It consistently outperforms GPT-5 mini and achieves pass rates close to the flagship GPT-5.4 with faster processing. Benchmarks show a 2× or greater speed improvement over GPT-5 mini, offering the best performance/latency trade-off for coding workflows.

GPT-5.4 nano

The smallest and most affordable model in the GPT-5.4 family. Optimized for high-volume use cases where speed and cost are the top priorities — such as classification, data extraction, ranking, and coding sub-agents. Not suited for complex tasks requiring deep reasoning.

GPT-4o

The general-purpose flagship model with high intelligence for both text and image tasks. It is now a legacy model, superseded by the GPT-5.4 series. GPT-4o was retired from ChatGPT in February 2026, but API access remains available.

GPT-4o mini

Designed as a compact model ideal for fine-tuning. Achieves results comparable to larger models (GPT-4o) at lower cost and latency through distillation. MMLU score: 82.0%. Best suited for minimizing inference costs on simple tasks.

Which Model to Choose

High-volume / cost-first: GPT-5.4 nano or GPT-4o mini. Choose GPT-5.4 nano if up-to-date knowledge is required; GPT-4o mini if fine-tuning is needed.
Coding and agents: GPT-5.4 mini. The best balance of speed and accuracy.
Complex reasoning / high-quality output: GPT-5.4. High cost at $2.50 input / $15.00 output per MTok, but delivers the best output quality of the current generation.
Legacy system compatibility: GPT-4o. API access remains available, allowing existing integrations to continue.

Best Value Options

For cost-effectiveness, the two standout models are GPT-5.4 nano and GPT-5.4 mini.

GPT-5.4 nano has nearly the same input cost as GPT-4o mini ($0.20 vs $0.15), yet offers a 400K context window, knowledge up to August 2025, and full access to native tools such as web search, file search, and MCP. It surpasses GPT-4o mini in almost every dimension except knowledge cutoff, so switching to GPT-5.4 nano makes sense for any use case that doesn't require fine-tuning.

GPT-5.4 mini is cheaper on input ($0.75) than GPT-4o ($2.50/MTok) while outperforming GPT-4o in coding and agentic workflows. If you regularly use GPT-4o, switching to GPT-5.4 mini is likely to reduce costs while improving performance simultaneously.

On the other hand, GPT-4o now feels overpriced. Its input cost matches GPT-5.4 ($2.50/MTok), yet it falls behind in context size, knowledge recency, and tool support. Unless you specifically need fine-tuning or compatibility with existing systems, there is little reason to actively choose GPT-4o.

References

Inspecting AI Coding Tool Traffic with mitmproxy

April 5, 2026

ひかり

By setting up mitmproxy as a man-in-the-middle proxy, you can monitor the API traffic that AI coding tools make in real time.

How It Works

Many AI coding tools are Node.js applications that communicate with external APIs over HTTPS. By inserting mitmproxy as a man-in-the-middle proxy and configuring Node.js to trust the mitmproxy CA certificate, you can decrypt the encrypted traffic and inspect it in real time.

Installation

The easiest way to install mitmproxy is via uv.

uv tool install mitmproxy

Proxy Configuration

Set the following environment variables before launching the tool.

$env:HTTPS_PROXY = "http://127.0.0.1:8080"
$env:HTTP_PROXY  = "http://127.0.0.1:8080"
$env:NODE_EXTRA_CA_CERTS = "$env:USERPROFILE\.mitmproxy\mitmproxy-ca-cert.pem"

About NODE_EXTRA_CA_CERTS

Setting only HTTPS_PROXY and HTTP_PROXY will cause Node.js TLS verification to fail. mitmproxy uses a self-signed certificate when relaying HTTPS traffic, which Node.js rejects by default.

By specifying the path to the mitmproxy CA certificate in NODE_EXTRA_CA_CERTS, Node.js will trust it and the connection will succeed.

Generating the CA Certificate

mitmproxy automatically generates a CA certificate on first launch and saves it to ~\.mitmproxy\. If you haven't generated it yet, simply start mitmproxy once.

mitmweb

A browser window opens automatically, showing the proxy management UI at http://127.0.0.1:8081.

Launching the Tool

Launch the tool in a separate terminal with the environment variables set.

Once you start using the tool, requests will appear in the mitmweb UI.

Captured Traffic

Endpoint

The tool sends requests to the following endpoint.

POST https://api.example.com/v1/messages

Request Headers

x-service-version: ...
content-type: application/json
x-api-key: sk-...

Request Body

{
  "model": "model-name",
  "max_tokens": 16000,
  "stream": true,
  "system": [
    {
      "type": "text",
      "text": "..."
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": "..."
    }
  ],
  "tools": [
    {
      "name": "Read",
      "description": "...",
      "input_schema": {}
    }
  ]
}

Streaming responses are received in Server-Sent Events (SSE) format with stream: true.

Response

data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
...
data: {"type":"message_stop"}

What You Can Learn

Item	Details
API endpoint	`api.example.com/v1/messages`
Authentication	API key (`x-api-key` header)
Streaming	SSE format
Tool definitions	Included in every request
System prompt	Thousands to tens of thousands of tokens

The system prompt contains the AI coding tool's operating principles, descriptions of available tools, and usage guidelines.

Summary

The key is trusting the mitmproxy CA certificate in Node.js via NODE_EXTRA_CA_CERTS
API communication uses SSE streaming
You can inspect the internal structure of AI coding tools, including tool definitions and the system prompt

Calling the Vertex AI Gemini API from PowerShell

April 3, 2026

ひかり

This covers how to call Gemini models via Google Cloud's Vertex AI from PowerShell. Both the OpenAI-compatible endpoint and the native Gemini endpoint are explained.

Authentication

gcloud auth (Recommended)

No API key required. Uses your existing Google Cloud credentials.

$accessToken = (gcloud auth print-access-token)

API key

$apiKey = $env:VERTEX_API_KEY

Endpoints

OpenAI-compatible endpoint (gcloud auth)

https://{region}-aiplatform.googleapis.com/v1beta1/projects/{projectId}/locations/{region}/endpoints/openapi/chat/completions

The request and response format is identical to the OpenAI API. The model name requires a google/ prefix (e.g., google/gemini-2.5-flash-lite).

Native Gemini endpoint (API key)

https://{region}-aiplatform.googleapis.com/v1/projects/{projectId}/locations/{region}/publishers/google/models/{model}:generateContent

For streaming, use :streamGenerateContent.

Basic calls

OpenAI-compatible (gcloud auth)

$projectId   = "your-project-id"
$region      = "us-central1"
$model       = "google/gemini-2.5-flash-lite"
$accessToken = (gcloud auth print-access-token)

$body = @{
    model    = $model
    messages = @(
        @{
            role    = "user"
            content = "What is the population of Tokyo?"
        }
    )
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1beta1/projects/$projectId/locations/$region/endpoints/openapi/chat/completions"

$response = Invoke-RestMethod `
    -Uri         $uri `
    -Method      Post `
    -ContentType "application/json" `
    -Headers     @{ Authorization = "Bearer $accessToken" } `
    -Body        $body

$response.choices[0].message.content

Native Gemini (API key)

$projectId = "your-project-id"
$region    = "us-central1"
$model     = "gemini-2.5-flash-lite"
$apiKey    = $env:VERTEX_API_KEY

$body = @{
    contents = @(
        @{
            role  = "user"
            parts = @(
                @{ text = "What is the population of Tokyo?" }
            )
        }
    )
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1/projects/$projectId/locations/$region/publishers/google/models/${model}:generateContent?key=$apiKey"

$response = Invoke-RestMethod `
    -Uri         $uri `
    -Method      Post `
    -ContentType "application/json" `
    -Body        $body

$response.candidates[0].content.parts[0].text

Response structure

OpenAI-compatible

$response.choices[0].message.content  # generated text
$response.usage.total_tokens           # total token count
$response.model                        # model used

Native Gemini

$response.candidates[0].content.parts[0].text  # generated text
$response.usageMetadata.totalTokenCount         # total token count
$response.modelVersion                          # model version used

For streaming (streamGenerateContent), an array of chunks is returned. Concatenate them to retrieve the full text.

$fullText = ($response | ForEach-Object {
    $_.candidates[0].content.parts[0].text
}) -join ""

Adding a system prompt

OpenAI-compatible

$body = @{
    model    = $model
    messages = @(
        @{
            role    = "system"
            content = "You are an AI assistant that responds in Japanese. Answer concisely."
        }
        @{
            role    = "user"
            content = "What is the speed of light?"
        }
    )
} | ConvertTo-Json -Depth 10

Native Gemini

$body = @{
    system_instruction = @{
        parts = @(
            @{ text = "You are an AI assistant that responds in Japanese. Answer concisely." }
        )
    }
    contents = @(
        @{
            role  = "user"
            parts = @(@{ text = "What is the speed of light?" })
        }
    )
} | ConvertTo-Json -Depth 10

Multi-turn conversation

Place the conversation history in an array to achieve multi-turn conversation.

OpenAI-compatible

$body = @{
    model    = $model
    messages = @(
        @{ role = "user";      content = "Do you prefer cats or dogs?" }
        @{ role = "assistant"; content = "I prefer cats." }
        @{ role = "user";      content = "Why is that?" }
    )
} | ConvertTo-Json -Depth 10

Native Gemini

The assistant role is specified as "model".

$body = @{
    contents = @(
        @{
            role  = "user"
            parts = @(@{ text = "Do you prefer cats or dogs?" })
        }
        @{
            role  = "model"
            parts = @(@{ text = "I prefer cats." })
        }
        @{
            role  = "user"
            parts = @(@{ text = "Why is that?" })
        }
    )
} | ConvertTo-Json -Depth 10

Available models

Model	OpenAI-compatible name	Description
`gemini-2.5-flash-lite`	`google/gemini-2.5-flash-lite`	Lightweight, fast, low-cost
`gemini-2.5-flash`	`google/gemini-2.5-flash`	Balanced
`gemini-2.5-pro`	`google/gemini-2.5-pro`	High-precision, for complex tasks

Which approach to use

Situation	Recommended approach
GCP-authenticated environment (dev, CI, etc.)	OpenAI-compatible + gcloud auth
Only an API key available	Native Gemini
Migrating from OpenAI	OpenAI-compatible (minimizes code changes)
Streaming required	Native Gemini

Notes

Do not hardcode the API key in scripts; load it from an environment variable ($env:VERTEX_API_KEY).
With gcloud auth, tokens expire in about 1 hour. Long-running scripts should refresh the token as needed.
Each project has its own rate limits and quotas. Check them before sending large numbers of requests.

Comparing Anthropic API and AWS Bedrock Pricing

March 30, 2026

ひかり

When using Claude via API, you have more than two options: in addition to calling the Anthropic API directly, you can also use it via AWS Bedrock, Google Vertex AI, or Microsoft Azure (Azure AI Foundry). Base pricing is the same across all routes, but there are differences in batch processing and cloud ecosystem integration.

Unit: USD / 1M tokens (MTok). Information as of March 2026.

On-Demand Base Pricing

Model	Type	Anthropic API	Bedrock	Vertex AI	Azure
Claude Opus 4.6	Input	$5.00	$5.00	$5.00	$5.00
	Output	$25.00	$25.00	$25.00	$25.00
Claude Sonnet 4.6	Input	$3.00	$3.00	$3.00	$3.00
	Output	$15.00	$15.00	$15.00	$15.00
Claude Haiku 4.5	Input	$1.00	$1.00	$1.00	$1.00
	Output	$5.00	$5.00	$5.00	$5.00
Claude Sonnet 4.5	Input	$3.00	$3.00	$3.00	$3.00
	Output	$15.00	$15.00	$15.00	$15.00

Base pricing is identical across all routes.

Note that Vertex AI regional endpoints carry a 10% surcharge over global endpoint pricing. Bedrock offers Long Context variants as separate SKUs at the same price; on the Anthropic API, Long Context is integrated into the standard models.

Cache Pricing

Prompt Caching rates are also identical across all routes.

Model	Cache Type	Anthropic API	Bedrock	Vertex AI	Azure
Claude Opus 4.6	5-min cache write	$6.25	$6.25	$6.25	$6.25
	1-hour cache write	$10.00	$10.00	$10.00	$10.00
	Cache read	$0.50	$0.50	$0.50	$0.50
Claude Sonnet 4.6	5-min cache write	$3.75	$3.75	$3.75	$3.75
	1-hour cache write	$6.00	$6.00	$6.00	$6.00
	Cache read	$0.30	$0.30	$0.30	$0.30
Claude Haiku 4.5	5-min cache write	$1.25	$1.25	$1.25	$1.25
	1-hour cache write	$2.00	$2.00	$2.00	$2.00
	Cache read	$0.10	$0.10	$0.10	$0.10

Cache writes come in two TTL tiers: 5-minute (short-term) and 1-hour (long-term). Longer TTL means higher write cost, but for applications with lengthy system prompts that are read repeatedly, the savings on read pricing more than compensate.

Batch Processing Pricing

Bedrock, Vertex AI, and the Anthropic API all offer an asynchronous batch API at 50% off on-demand pricing. Azure does not explicitly list batch pricing at this time.

Model	Batch Input	Batch Output
Claude Opus 4.6	$2.50	$12.50
Claude Sonnet 4.6	$1.50	$7.50
Claude Haiku 4.5	$0.50	$2.50
Claude Sonnet 4.5	$1.50	$7.50

For large-scale batch workloads (log analysis, embedding generation, etc.), any of these routes can cut costs in half.

Ecosystem Comparison

Feature	Anthropic API	Bedrock	Vertex AI	Azure
Base pricing	Same	Same	Same	Same
Regional surcharge	—	—	+10% (regional)	—
Batch processing (50% off)	○	○	○	Not listed
Tokyo region	—	○	○	—
IAM / audit log integration	—	AWS	Google Cloud	Azure
VPC / PrivateLink	—	○	○	○
Billing integration	Anthropic direct	AWS	Google Cloud	Azure
New feature rollout speed	Fastest	Delayed	Delayed	Delayed

New features (such as Extended Thinking) roll out to the Anthropic API first; Vertex AI, Bedrock, and Azure typically follow weeks later.

Which Should You Choose?

Simple setup / prototyping: Anthropic API requires just one API key and gets new features first.
Deep AWS integration: If you need IAM, CloudWatch, or VPC, Bedrock is the natural choice. Tokyo region supported.
Deep Google Cloud integration: Vertex AI fits right in. Note the 10% surcharge on regional endpoints.
Deep Azure integration: Available via Azure AI Foundry, integrated with Azure billing and management.
Heavy batch workloads: Bedrock, Vertex AI, and the Anthropic API all offer 50% off batch pricing.

References

What Are AI Agent Skills? How They Work, Explained Simply

March 26, 2026

ひかり

Adding "skills" to an AI agent lets you extend its capabilities, just like installing a plugin for an app. This article explains how Agent Skills work and what an agent actually does internally when using them.

What Is an AI Agent?

First, an AI agent is an AI program that receives instructions and autonomously completes tasks.

Unlike a simple AI that just answers questions (like ChatGPT in basic use), an agent can:

Read and write files
Execute code and check results
Call external APIs and tools
Make decisions across multiple steps on its own

What Are Skills?

Agent Skills is a mechanism for giving agents new abilities and domain knowledge.

Think of it like handing a new employee a work manual. Once the agent reads the manual (the skill), it understands how to approach that task correctly.

Without skills: "Write a blog post" → Agent writes something generic
With skills:    "Write a blog post" → Agent follows the manual and produces consistent, quality output

Skills are primarily written as Markdown files (SKILL.md) and can include:

Step-by-step procedures: What to do and in what order
Scripts: Automatable processes
Samples and config: Resources for the agent to reference

Why Are Skills Needed?

AI agents are extremely capable, but they don't know anything specific about your project.

For example:

"How does this team write commit messages?"
"What frontmatter format does this blog use?"
"Which commands are used for deployment?"

Without skills, agents can't know any of this. Skills let agents understand "the right way to do things" before acting.

How an Agent Processes a Skill

Let's look at what's happening inside the agent.

Here are the key points:

1. Loading the Skill

The agent reads the skill at the start. The skill content is passed as part of the LLM's input (prompt). The LLM reads this and understands "the right approach for this task."

2. Breaking Down the Task

Based on the instructions, the LLM breaks the task into smaller steps: "Read 3 existing posts first," "then decide on a filename," "then write the frontmatter," and so on.

3. Calling Tools

At each step, the agent calls tools as needed — reading files, searching the web, executing code — following the procedure defined by the skill.

4. Feeding Back Results

Tool results are passed back to the LLM. The LLM looks at the results and decides what to do next, looping until the task is complete.

Skill Commands

Skills can be invoked as slash commands (/command-name).

When a command is called, the corresponding Markdown file's content is expanded as a prompt, and the agent begins executing those steps.

Skills Are Growing

The Agent Skills format was developed and open-sourced by Anthropic and is now supported by many tools:

Tool	Supported
Claude Code	✅
GitHub Copilot	✅
Cursor	✅
Gemini CLI	✅
OpenAI Codex	✅
VS Code	✅

The biggest advantage is that the same skill can be reused across different tools.

Summary

Skills are a mechanism for giving agents specialized knowledge and procedures
You can create one by writing steps and rules in a Markdown file (SKILL.md)
The agent receives the skill as a prompt; the LLM interprets it and executes each step
It's an open standard supported by Claude Code, Cursor, GitHub Copilot, and many more

With skills, you no longer have to explain the same things to your AI every time — agents can perform tasks with consistent quality, exactly the way you want.

Interacting with GPT-5 via API

August 9, 2025

Since GPT-5 has been released, I tried hitting the API with PowerShell.

Code

$uri = "https://api.openai.com/v1/chat/completions"
$headers = @{
   "Authorization" = "Bearer $env:OPENAI_API_KEY"
   "Content-Type" = "application/json"
}

$body = @{
   model = "gpt-5"
   messages = @(
        @{
           role = "user"
           content = "The total cost of a notebook and pencil is 100 yen. A pencil is 40 yen cheaper than a notebook. What is the price of a pencil?"
       }
   )
} | ConvertTo-Json -Depth 2

$response = Invoke-RestMethod -Uri $uri -Method Post -Headers $headers -Body $body

foreach($choice in $response.choices){
   $choice.message.content
}

Output

30 yen

Reason:
- Let x be the price of a notebook and y be the price of a pencil.
 - x + y = 100
 - y = x - 40
- Substituting, 2x - 40 = 100 → x = 70 → y = 30
- Verification: 70 + 30 = 100, and a pencil is 40 yen cheaper than a notebook.

Pricing Comparison​

Specs Comparison​

Performance Comparison​

GPT-5.4​

GPT-5.4 mini​

GPT-5.4 nano​

GPT-4o​

GPT-4o mini​

Which Model to Choose​

Best Value Options​

References​

How It Works​

Installation​

Proxy Configuration​

About NODE_EXTRA_CA_CERTS​

Generating the CA Certificate​

Launching the Tool​

Captured Traffic​

Endpoint​

Request Headers​

Request Body​

Response​

What You Can Learn​

Summary​

Authentication​

gcloud auth (Recommended)​

API key​

Endpoints​

OpenAI-compatible endpoint (gcloud auth)​

Native Gemini endpoint (API key)​

Basic calls​

OpenAI-compatible (gcloud auth)​

Native Gemini (API key)​

Response structure​

OpenAI-compatible​

Native Gemini​

Adding a system prompt​

OpenAI-compatible​

Native Gemini​

Multi-turn conversation​

OpenAI-compatible​

Native Gemini​

Available models​

Which approach to use​

Notes​

On-Demand Base Pricing​

Cache Pricing​

Batch Processing Pricing​

Ecosystem Comparison​

Which Should You Choose?​

References​

What Is an AI Agent?​

What Are Skills?​

Why Are Skills Needed?​

How an Agent Processes a Skill​

1. Loading the Skill​

2. Breaking Down the Task​

3. Calling Tools​

4. Feeding Back Results​

Skill Commands​

Skills Are Growing​

Summary​

Code​

Output​

Pricing Comparison

Specs Comparison

Performance Comparison

GPT-5.4

GPT-5.4 mini

GPT-5.4 nano

GPT-4o

GPT-4o mini

Which Model to Choose

Best Value Options

References

How It Works

Installation

Proxy Configuration

About NODE_EXTRA_CA_CERTS

Generating the CA Certificate

Launching the Tool

Captured Traffic

Endpoint

Request Headers

Request Body

Response

What You Can Learn

Summary

Authentication

gcloud auth (Recommended)

API key

Endpoints

OpenAI-compatible endpoint (gcloud auth)

Native Gemini endpoint (API key)

Basic calls

OpenAI-compatible (gcloud auth)

Native Gemini (API key)

Response structure

OpenAI-compatible

Native Gemini

Adding a system prompt

OpenAI-compatible

Native Gemini

Multi-turn conversation

OpenAI-compatible

Native Gemini

Available models

Which approach to use

Notes

On-Demand Base Pricing

Cache Pricing

Batch Processing Pricing

Ecosystem Comparison

Which Should You Choose?

References

What Is an AI Agent?

What Are Skills?

Why Are Skills Needed?

How an Agent Processes a Skill

1. Loading the Skill

2. Breaking Down the Task

3. Calling Tools

4. Feeding Back Results

Skill Commands

Skills Are Growing

Summary

Code

Output