Token-Based Pricing

Overview

Augment uses token-based pricing. Instead of buying a pool of credits and learning a separate conversion table, you pay for the actual resources your work consumes. Every request is billed as the sum of three components:

LLM tokens at the model provider’s public API list price
Compute for time Cosmos spends running your work, at our published per-hour rate
A flat 40% service fee on LLM usage, which funds the Context Engine and the Cosmos platform

This gives you the flexibility to choose the right model for each task, with full transparency on what every request costs.

How tokens are billed

When you use Cosmos or Auggie CLI, three things can cost money:

LLM tokens — every model call uses input and output tokens. You’re billed at the model provider’s public API list price.
Service fee — Augment adds a 40% service fee on LLM usage. There is no service fee on compute. The service fee covers the Context Engine that hosts and dynamically indexes your codebase, and the Cosmos platform that runs your sessions.
Cosmos compute — when Cosmos runs work in a sandbox, you pay for the compute time it consumes at $0.19 per hour, billed in 5-minute increments (rounded up). A session that uses 7 minutes of compute is billed as 10 minutes; a session that uses 16 minutes is billed as 20 minutes.

All three components draw from the same usage balance.

The Business plan

Item	Included
Monthly price	$100 / month
Included usage	$100 across LLM tokens, compute, and service fees
Service fee	40% of LLM usage (no fee on compute)
Seats	Up to 50 — no per-seat charge
Products included	Cosmos and Auggie CLI
Cosmos compute rate	$0.19 / hour, billed in 5-minute increments
Top-ups	Pay-as-you-go at the same rates once included usage is consumed

If you need more than 50 seats or custom Cosmos setup, see Pricing Plans for Enterprise options.

A typical month on the Business plan

Here’s how a developer might spend their $100 of included usage in a typical month:

Component	Amount	What it is
LLM tokens	$60	Input and output tokens at public API list price
Service fee	$24	40% of LLM token spend
Compute	$16	Cosmos compute at $0.19/hour
Total	$100

Your actual mix depends on the models you choose, how much you use Cosmos, and the size of the tasks you run.

Top-ups

Once you’ve used your $100 of included usage, billing continues automatically on pay-as-you-go at the same rates: LLM tokens at public API list price, the 40% service fee on LLM, and compute at $0.19/hour. There’s no minimum top-up amount.

Token costs by model

Different models consume your usage balance at different rates based on their underlying provider pricing. All rates below are per million tokens, at the provider’s public API list price. GPT model rates are the standard (default) API tier.

Model	Input	Output	Cache read	Cache write	Best for
Claude Sonnet 4.6	$3.00	$15.00	$0.30	$3.75	Latest generation Sonnet. Balanced capability, ideal for medium or large tasks and multi-step work.
Claude Sonnet 4.5	$3.00	$15.00	$0.30	$3.75	Previous-generation Sonnet. Same pricing as 4.6; consider upgrading to 4.6 for the latest improvements.
Claude Opus 4.7	$5.00	$25.00	$0.50	$6.25	Most capable model. Best for long-running tasks, deep reasoning, and opinionated code generation.
Claude Opus 4.6 / 4.5	$5.00	$25.00	$0.50	$6.25	Previous-generation Opus models. Same pricing as 4.7.
Claude Fable 5	$10.00	$50.00	$1.00	$12.50	Anthropic’s newest frontier model. Premium option for the most demanding reasoning and long-horizon agentic work.
Claude Haiku 4.5	$1.00	$5.00	$0.10	$1.25	Lightweight, fast reasoning. Best for quick edits and small tasks.
Gemini 3.1 Pro	$2.00	$12.00	$0.20	$2.00	Structural thinking, planning, debugging, and daily execution.
GPT-5.1	$1.25	$10.00	$0.125	$1.25	Advanced reasoning and context. Good for medium-size tasks.
GPT-5.2	$1.75	$14.00	$0.175	$1.75	Enhanced reasoning for complex tasks requiring long chains of thought.
GPT-5.4	$2.50	$15.00	$0.25	$2.50	Advanced GPT generation. Smart and token-efficient. Standout computer use and multi-agent orchestration.
GPT-5.5	$5.00	$30.00	$0.50	$5.00	Most advanced GPT option. Designed for highest-complexity coding and multi-agent orchestration.
Kimi K2.6	$0.95	$4.00	$0.16	$0.95	Cheap agentic work. Smarter than Haiku at a fraction of frontier-model cost.
Prism (Claude + Gemini)	Variable (routed)	Variable (routed)	Variable (routed)	Variable (routed)	Routes among Claude Opus 4.7, Claude Sonnet 4.6, and Gemini 3.0 Flash.
Prism (GPT + Kimi)	Variable (routed)	Variable (routed)	Variable (routed)	Variable (routed)	Routes among GPT-5.5, GPT-5.4, and Kimi K2.6.

Per-million-token rates follow each provider’s public API list price and change over time, so this table is a snapshot. Your dashboard at app.augmentcode.com always shows the live per-million-token rates for input, output, cache reads, and cache writes.

Prism routed pricing

Prism uses variable pricing because each request is routed instead of billed against a fixed model rate. When you choose a Prism option, Augment selects the best-fit model from that family based on the task, context, and current system conditions, and the cost reflects the route used. Prism is designed to cost, on average, 20–30% less than frontier model costs, with minimal quality tradeoff. Exact savings depend on the task.

Example tasks

To illustrate how different models compare, here are examples of tasks and their approximate cost across each model. All costs include the 40% service fee on LLM usage and assume no Cosmos compute time.

Sonnet task: Fix a 500 error in an API endpoint

The /api/users/:id endpoint returns 500 errors when a user has no associated organization. Add null checking and return a 404 with a clear error message instead. Test that users with organizations still work correctly.

Cost: ~$0.25 with Sonnet 4.6 (or Sonnet 4.5)

Model	Approx. cost	Notes
Sonnet 4.6 / 4.5	$0.25	Baseline
Opus 4.7 / 4.6 / 4.5	$0.43	Use for harder tasks
Claude Fable 5	$0.85	Premium option — 2× Opus pricing
GPT-5.2	$0.34	Use for harder tasks
GPT-5.4	$0.18	$0.07 saved
GPT-5.5	$0.36	Use for highest-complexity tasks
Gemini 3.1 Pro	$0.23	$0.02 saved
Haiku 4.5	$0.08	$0.17 saved
GPT-5.1	$0.19	$0.06 saved
Kimi K2.6	$0.13	$0.12 saved
Prism (Claude + Gemini)	Variable routed cost	Savings vary by route
Prism (GPT + Kimi)	Variable routed cost	Savings vary by route

Opus task: Design and implement a multi-tenant billing system

Our B2B SaaS platform needs a multi-tenant billing system. Design the database schema, implement the metering service, integrate with Stripe’s usage-based billing API, and handle edge cases like mid-cycle plan changes, prorations, and failed payments. Consider how this will scale to 10k+ tenants.

Cost: ~$0.85 with Opus 4.7 (or Opus 4.6, Opus 4.5)

Costs are illustrative for a standard medium-complexity task and include the 40% service fee on LLM usage. Actual cost depends on the model you choose, how much repo context the request loads, and whether the task uses Cosmos compute.

Monitoring token usage

You can track token consumption from Auggie CLI or the web dashboard.

In Auggie CLI

Monitor usage per session directly from the CLI as you work.

On the web

Visit app.augmentcode.com for detailed dashboards that show:

Total usage by your team in dollars
Usage per team member
Breakdown by model, activity, and component (LLM, service fee, compute)
Usage trends over time
Remaining included usage and pay-as-you-go spend

Team administrators can access more detailed analytics to optimize usage and identify opportunities to save by using more efficient models for appropriate tasks.

Understanding your usage breakdown

Usage is organized by model, activity type, and component (LLM, service fee, or compute).

Session types

Activity	What it is
CliAgent	Command-line session (Auggie CLI)
Cosmos	Sandboxed compute session running long-form work

Optional features

Activity	What it is
Prompt Enhancer	Token usage from improving your prompts before sending them.
Code Review	Automated code review for pull requests (when enabled for your repository).
Skills	Token usage attributed to a specific Skill invoked during a session.

Background activities

Augment performs lightweight processing in the background to keep your experience smooth.

Activity	What it does
Context Compression	Summarizes conversation history to keep long sessions fast and responsive.
System	General background processing that helps Augment work smoothly.

These background activities use a small fraction of your total token consumption.

Tips for optimizing token usage

Match the model to the task. Use Haiku for simple tasks, GPT-5.1 for medium tasks, Sonnet for complex work, Opus for the hardest challenges, GPT-5.4 for computer use and multi-agent orchestration, GPT-5.5 for the highest-complexity coding workflows, and Prism when you want Augment to balance quality and cost by routing within a curated model family.
Be specific in your prompts. Clear, detailed instructions help models work more efficiently and use fewer output tokens.
Lean on prompt caching. Augment automatically caches stable context (your repo index, AGENTS.md, recently used files). Cached input tokens are billed at the provider’s reduced cache rate — typically around 10% of the input rate — and the service fee scales with them.
Break down large tasks. Sometimes splitting a complex task into smaller ones is more token-efficient than planning and executing everything in one session.
Scope your Cosmos sessions. Compute is billed at $0.19/hour in 5-minute increments with no service fee. Closing a session when you’re done releases the compute and stops the meter at the next increment boundary.
Review usage patterns. Check your dashboards regularly to identify optimization opportunities. Team admins can set guidelines on which models to use for which work.

Frequently asked questions

How is the cost of a single request calculated?

Each request is billed as the sum of input tokens × input rate, output tokens × output rate, cache reads × cache read rate, cache writes × cache write rate, plus a 40% service fee on the LLM total, plus Cosmos compute minutes × $0.19/hour if compute was used. Each request appears as a line item in your dashboard with the component breakdown.

What does the 40% service fee cover?

The service fee covers the Context Engine — which hosts your codebase and dynamically indexes changes as you work — and the Cosmos platform that orchestrates your sessions, manages model fallback and concurrency, and runs the surrounding infrastructure.

Is there a service fee on compute?

No. Cosmos compute is billed at $0.19/hour with no service fee on top.

Does unused included usage roll over?

No. The $100 of included usage resets at the start of each billing cycle.

Is included usage prorated?

Yes. If you start a Business plan mid-cycle, included usage is prorated for the remainder of the billing period.

What happens when I use up my $100?

Billing continues automatically on pay-as-you-go at the same rates. There’s no need to top up manually, and there’s no minimum spend.

Can I switch models mid-conversation?

Yes. In Auggie CLI, switch models mid-conversation with the /model command. Each message consumes tokens based on the model selected for that specific message.

Where do I see live per-million-token rates?

Visit app.augmentcode.com and open Usage → Models. The dashboard always shows current input, output, cache read, and cache write rates for every available model.

Available Models

Learn about the different AI models available in Augment.

Pricing Plans

View current pricing plans and what’s included on each.

Teams Admin Guide

Manage team subscriptions and billing.

Usage Dashboards & Budgets

Monitor token consumption and set budgets across your organization.

​Overview

​How tokens are billed

​The Business plan

​A typical month on the Business plan

​Top-ups

​Token costs by model

​Prism routed pricing

​Example tasks

​Sonnet task: Fix a 500 error in an API endpoint

​Opus task: Design and implement a multi-tenant billing system

​Monitoring token usage

​In Auggie CLI

​On the web

​Understanding your usage breakdown

​Session types

​Optional features

​Background activities

​Tips for optimizing token usage

​Frequently asked questions

​Related resources

Available Models

Pricing Plans

Teams Admin Guide

Usage Dashboards & Budgets

Overview

How tokens are billed

The Business plan

A typical month on the Business plan

Top-ups

Token costs by model

Prism routed pricing

Example tasks

Sonnet task: Fix a 500 error in an API endpoint

Opus task: Design and implement a multi-tenant billing system

Monitoring token usage

In Auggie CLI

On the web

Understanding your usage breakdown

Session types

Optional features

Background activities

Tips for optimizing token usage

Frequently asked questions

Related resources