Skip to content

Getting Started

tokencost (tokencost-dev) is an MCP server that provides real-time LLM pricing data. It runs as a local process and communicates over stdio.

That’s it. No API keys, no accounts, no configuration files.

The fastest way to get started:

Terminal window
claude mcp add tokencost-dev -- npx -y tokencost-dev

This registers the MCP server and it will be available in all future Claude Code sessions.

Open Claude > Settings > Developer > Edit Config to open claude_desktop_config.json, then add:

{
"mcpServers": {
"tokencost-dev": {
"command": "npx",
"args": ["-y", "tokencost-dev"]
}
}
}

Save the file and restart Claude Desktop.

Create or edit .vscode/mcp.json in your workspace:

{
"servers": {
"tokencost-dev": {
"command": "npx",
"args": ["-y", "tokencost-dev"]
}
}
}

VS Code will detect the file and make the tools available in Copilot chat (agent mode).

Add to your Cursor MCP config (.cursor/mcp.json):

{
"mcpServers": {
"tokencost-dev": {
"command": "npx",
"args": ["-y", "tokencost-dev"]
}
}
}

Open Windsurf Settings > Cascade > MCP Servers, or edit ~/.codeium/windsurf/mcp_config.json directly:

{
"mcpServers": {
"tokencost-dev": {
"command": "npx",
"args": ["-y", "tokencost-dev"]
}
}
}

The server config is the same for any MCP client that supports stdio transport:

{
"mcpServers": {
"tokencost-dev": {
"command": "npx",
"args": ["-y", "tokencost-dev"]
}
}
}

Consult your client’s documentation for where to place this configuration.

Once installed, just ask your AI assistant a pricing question in natural language:

“How much does Claude Sonnet 4.5 cost per million tokens?”

The assistant will call the get_model_details tool and return something like:

Model: claude-sonnet-4-5
Provider: anthropic
Mode: chat
Pricing (per 1M tokens):
Input: $3.00
Output: $15.00
Context Window:
Max Input: 200K
Max Output: 8K
Capabilities: vision, function_calling, parallel_function_calling
  1. On first use, the server fetches pricing data from the LiteLLM community registry
  2. Data is cached in-memory for 24 hours (with a disk fallback)
  3. Your AI assistant calls one of the 4 tools via the MCP protocol
  4. Results are returned as formatted text

No data leaves your machine — the only network request is fetching the public pricing registry.