Skip to content

Examples

Here’s what it looks like in practice — natural language questions, instant pricing answers:

tokencost demo

Below are detailed examples of the three most common workflows.

Prompt:

“I’m choosing between Claude Sonnet and GPT-5.2 for a summarization pipeline. Compare their pricing and context windows.”

Tool called: compare_models

Output:

Top 5 most cost-effective models matching your criteria:
1. gpt-5.2
Provider: openai | Mode: chat
Input: $2.00/1M | Output: $8.00/1M
Context: 1,047K in / 64K out
Capabilities: function_calling, vision, parallel_function_calling
2. claude-sonnet-4-5
Provider: anthropic | Mode: chat
Input: $3.00/1M | Output: $15.00/1M
Context: 200K in / 8K out
Capabilities: function_calling, vision, parallel_function_calling

Takeaway: GPT-5.2 is cheaper per token and has a much larger context window. Claude Sonnet 4.5 costs ~50% more on input and nearly 2x on output, but may be preferred for quality on specific tasks. For a high-volume summarization pipeline, the pricing difference adds up.


Prompt:

“What’s the biggest context window I can get for under $1/1M input tokens?”

Tool called: compare_models

Output:

Top 5 most cost-effective models matching your criteria:
1. gemini/gemini-2.5-flash
Provider: gemini | Mode: chat
Input: $0.15/1M | Output: $0.60/1M
Context: 1,048K in / 65K out
Capabilities: function_calling, vision, parallel_function_calling
2. gemini/gemini-2.0-flash
Provider: gemini | Mode: chat
Input: $0.10/1M | Output: $0.40/1M
Context: 1,048K in / 8K out
Capabilities: function_calling, vision, parallel_function_calling
3. deepseek/deepseek-chat
Provider: deepseek | Mode: chat
Input: $0.14/1M | Output: $0.28/1M
Context: 64K in / 8K out
Capabilities: function_calling, parallel_function_calling
4. gemini/gemini-2.0-flash-lite
Provider: gemini | Mode: chat
Input: $0.04/1M | Output: $0.15/1M
Context: 1,048K in / 8K out
Capabilities: function_calling, vision, parallel_function_calling
5. mistral/mistral-small-latest
Provider: mistral | Mode: chat
Input: $0.10/1M | Output: $0.30/1M
Context: 32K in / 32K out
Capabilities: function_calling, parallel_function_calling

Takeaway: Gemini models dominate the budget-friendly, large-context space — offering 1M+ token windows at fractions of a cent. If you need a massive context window on a tight budget, Gemini 2.5 Flash gives you the best balance of capability and cost.


Prompt:

“How much would it cost to run Claude Opus vs Sonnet for a batch of 10K requests, averaging 2K input and 500 output tokens each?”

Tools called: calculate_estimate (twice — once per model)

Claude Opus 4.6:

Model: claude-opus-4-6
Provider: anthropic
Token counts:
Input: 20,000,000 tokens
Output: 5,000,000 tokens
Estimated cost:
Input: $300.00
Output: $150.00
Total: $450.00
Rate per 1M tokens:
Input: $15.00
Output: $30.00

Claude Sonnet 4.6:

Model: claude-sonnet-4-6
Provider: anthropic
Token counts:
Input: 20,000,000 tokens
Output: 5,000,000 tokens
Estimated cost:
Input: $60.00
Output: $75.00
Total: $135.00
Rate per 1M tokens:
Input: $3.00
Output: $15.00

Takeaway: Sonnet is 3.3x cheaper for this workload ($135 vs $450). Unless the task specifically requires Opus-level reasoning, Sonnet saves $315 per 10K-request batch — making it the clear choice for high-volume pipelines.