Getting Started
tokencost (tokencost-dev) is an MCP server that provides real-time LLM pricing data. It runs as a local process and communicates over stdio.
Prerequisites
Section titled “Prerequisites”- Node.js 18+ — download
That’s it. No API keys, no accounts, no configuration files.
Claude Code
Section titled “Claude Code”The fastest way to get started:
claude mcp add tokencost-dev -- npx -y tokencost-devThis registers the MCP server and it will be available in all future Claude Code sessions.
Claude Desktop
Section titled “Claude Desktop”Open Claude > Settings > Developer > Edit Config to open claude_desktop_config.json, then add:
{ "mcpServers": { "tokencost-dev": { "command": "npx", "args": ["-y", "tokencost-dev"] } }}Save the file and restart Claude Desktop.
VS Code (GitHub Copilot)
Section titled “VS Code (GitHub Copilot)”Create or edit .vscode/mcp.json in your workspace:
{ "servers": { "tokencost-dev": { "command": "npx", "args": ["-y", "tokencost-dev"] } }}VS Code will detect the file and make the tools available in Copilot chat (agent mode).
Cursor
Section titled “Cursor”Add to your Cursor MCP config (.cursor/mcp.json):
{ "mcpServers": { "tokencost-dev": { "command": "npx", "args": ["-y", "tokencost-dev"] } }}Windsurf
Section titled “Windsurf”Open Windsurf Settings > Cascade > MCP Servers, or edit ~/.codeium/windsurf/mcp_config.json directly:
{ "mcpServers": { "tokencost-dev": { "command": "npx", "args": ["-y", "tokencost-dev"] } }}Other MCP clients
Section titled “Other MCP clients”The server config is the same for any MCP client that supports stdio transport:
{ "mcpServers": { "tokencost-dev": { "command": "npx", "args": ["-y", "tokencost-dev"] } }}Consult your client’s documentation for where to place this configuration.
Your first query
Section titled “Your first query”Once installed, just ask your AI assistant a pricing question in natural language:
“How much does Claude Sonnet 4.5 cost per million tokens?”
The assistant will call the get_model_details tool and return something like:
Model: claude-sonnet-4-5Provider: anthropicMode: chat
Pricing (per 1M tokens): Input: $3.00 Output: $15.00
Context Window: Max Input: 200K Max Output: 8K
Capabilities: vision, function_calling, parallel_function_callingHow it works
Section titled “How it works”- On first use, the server fetches pricing data from the LiteLLM community registry
- Data is cached in-memory for 24 hours (with a disk fallback)
- Your AI assistant calls one of the 4 tools via the MCP protocol
- Results are returned as formatted text
No data leaves your machine — the only network request is fetching the public pricing registry.