Overview
Augment uses token-based pricing. Instead of buying a pool of credits and learning a separate conversion table, you pay for the actual resources your work consumes. Every request is billed as the sum of three components:- LLM tokens at the model provider’s public API list price
- Compute for time Cosmos spends running your work, at our published per-hour rate
- A flat 40% service fee on LLM usage, which funds the Context Engine and the Cosmos platform
How tokens are billed
When you use Cosmos or Auggie CLI, three things can cost money:- LLM tokens — every model call uses input and output tokens. You’re billed at the model provider’s public API list price.
- Service fee — Augment adds a 40% service fee on LLM usage. There is no service fee on compute. The service fee covers the Context Engine that hosts and dynamically indexes your codebase, and the Cosmos platform that runs your sessions.
- Cosmos compute — when Cosmos runs work in a sandbox, you pay for the compute time it consumes at $0.19 per hour, billed in 5-minute increments (rounded up). A session that uses 7 minutes of compute is billed as 10 minutes; a session that uses 16 minutes is billed as 20 minutes.
The Business plan
| Item | Included |
|---|---|
| Monthly price | $100 / month |
| Included usage | $100 across LLM tokens, compute, and service fees |
| Service fee | 40% of LLM usage (no fee on compute) |
| Seats | Up to 50 — no per-seat charge |
| Products included | Cosmos and Auggie CLI |
| Cosmos compute rate | $0.19 / hour, billed in 5-minute increments |
| Top-ups | Pay-as-you-go at the same rates once included usage is consumed |
A typical month on the Business plan
Here’s how a developer might spend their $100 of included usage in a typical month:| Component | Amount | What it is |
|---|---|---|
| LLM tokens | $60 | Input and output tokens at public API list price |
| Service fee | $24 | 40% of LLM token spend |
| Compute | $16 | Cosmos compute at $0.19/hour |
| Total | $100 |
Top-ups
Once you’ve used your $100 of included usage, billing continues automatically on pay-as-you-go at the same rates: LLM tokens at public API list price, the 40% service fee on LLM, and compute at $0.19/hour. There’s no minimum top-up amount.Token costs by model
Different models consume your usage balance at different rates based on their underlying provider pricing. All rates below are per million tokens, at the provider’s public API list price. GPT model rates are the standard (default) API tier.| Model | Input | Output | Cache read | Cache write | Best for |
|---|---|---|---|---|---|
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | $3.75 | Latest generation Sonnet. Balanced capability, ideal for medium or large tasks and multi-step work. |
| Claude Sonnet 4.5 | $3.00 | $15.00 | $0.30 | $3.75 | Previous-generation Sonnet. Same pricing as 4.6; consider upgrading to 4.6 for the latest improvements. |
| Claude Opus 4.7 | $5.00 | $25.00 | $0.50 | $6.25 | Most capable model. Best for long-running tasks, deep reasoning, and opinionated code generation. |
| Claude Opus 4.6 / 4.5 | $5.00 | $25.00 | $0.50 | $6.25 | Previous-generation Opus models. Same pricing as 4.7. |
| Claude Fable 5 | $10.00 | $50.00 | $1.00 | $12.50 | Anthropic’s newest frontier model. Premium option for the most demanding reasoning and long-horizon agentic work. |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 | $1.25 | Lightweight, fast reasoning. Best for quick edits and small tasks. |
| Gemini 3.1 Pro | $2.00 | $12.00 | $0.20 | $2.00 | Structural thinking, planning, debugging, and daily execution. |
| GPT-5.1 | $1.25 | $10.00 | $0.125 | $1.25 | Advanced reasoning and context. Good for medium-size tasks. |
| GPT-5.2 | $1.75 | $14.00 | $0.175 | $1.75 | Enhanced reasoning for complex tasks requiring long chains of thought. |
| GPT-5.4 | $2.50 | $15.00 | $0.25 | $2.50 | Advanced GPT generation. Smart and token-efficient. Standout computer use and multi-agent orchestration. |
| GPT-5.5 | $5.00 | $30.00 | $0.50 | $5.00 | Most advanced GPT option. Designed for highest-complexity coding and multi-agent orchestration. |
| Kimi K2.6 | $0.95 | $4.00 | $0.16 | $0.95 | Cheap agentic work. Smarter than Haiku at a fraction of frontier-model cost. |
| Prism (Claude + Gemini) | Variable (routed) | Variable (routed) | Variable (routed) | Variable (routed) | Routes among Claude Opus 4.7, Claude Sonnet 4.6, and Gemini 3.0 Flash. |
| Prism (GPT + Kimi) | Variable (routed) | Variable (routed) | Variable (routed) | Variable (routed) | Routes among GPT-5.5, GPT-5.4, and Kimi K2.6. |
Per-million-token rates follow each provider’s public API list price and change over time, so this table is a snapshot. Your dashboard at app.augmentcode.com always shows the live per-million-token rates for input, output, cache reads, and cache writes.
Prism routed pricing
Prism uses variable pricing because each request is routed instead of billed against a fixed model rate. When you choose a Prism option, Augment selects the best-fit model from that family based on the task, context, and current system conditions, and the cost reflects the route used. Prism is designed to cost, on average, 20–30% less than frontier model costs, with minimal quality tradeoff. Exact savings depend on the task.Example tasks
To illustrate how different models compare, here are examples of tasks and their approximate cost across each model. All costs include the 40% service fee on LLM usage and assume no Cosmos compute time.Sonnet task: Fix a 500 error in an API endpoint
The /api/users/:id endpoint returns 500 errors when a user has no associated organization. Add null checking and return a 404 with a clear error message instead. Test that users with organizations still work correctly.
Cost: ~$0.25 with Sonnet 4.6 (or Sonnet 4.5)
| Model | Approx. cost | Notes |
|---|---|---|
| Sonnet 4.6 / 4.5 | $0.25 | Baseline |
| Opus 4.7 / 4.6 / 4.5 | $0.43 | Use for harder tasks |
| Claude Fable 5 | $0.85 | Premium option — 2× Opus pricing |
| GPT-5.2 | $0.34 | Use for harder tasks |
| GPT-5.4 | $0.18 | $0.07 saved |
| GPT-5.5 | $0.36 | Use for highest-complexity tasks |
| Gemini 3.1 Pro | $0.23 | $0.02 saved |
| Haiku 4.5 | $0.08 | $0.17 saved |
| GPT-5.1 | $0.19 | $0.06 saved |
| Kimi K2.6 | $0.13 | $0.12 saved |
| Prism (Claude + Gemini) | Variable routed cost | Savings vary by route |
| Prism (GPT + Kimi) | Variable routed cost | Savings vary by route |
Opus task: Design and implement a multi-tenant billing system
Our B2B SaaS platform needs a multi-tenant billing system. Design the database schema, implement the metering service, integrate with Stripe’s usage-based billing API, and handle edge cases like mid-cycle plan changes, prorations, and failed payments. Consider how this will scale to 10k+ tenants.Cost: ~$0.85 with Opus 4.7 (or Opus 4.6, Opus 4.5)
Costs are illustrative for a standard medium-complexity task and include the 40% service fee on LLM usage. Actual cost depends on the model you choose, how much repo context the request loads, and whether the task uses Cosmos compute.
Monitoring token usage
You can track token consumption from Auggie CLI or the web dashboard.In Auggie CLI
Monitor usage per session directly from the CLI as you work.On the web
Visit app.augmentcode.com for detailed dashboards that show:- Total usage by your team in dollars
- Usage per team member
- Breakdown by model, activity, and component (LLM, service fee, compute)
- Usage trends over time
- Remaining included usage and pay-as-you-go spend
Understanding your usage breakdown
Usage is organized by model, activity type, and component (LLM, service fee, or compute).Session types
| Activity | What it is |
|---|---|
| CliAgent | Command-line session (Auggie CLI) |
| Cosmos | Sandboxed compute session running long-form work |
Optional features
| Activity | What it is |
|---|---|
| Prompt Enhancer | Token usage from improving your prompts before sending them. |
| Code Review | Automated code review for pull requests (when enabled for your repository). |
| Skills | Token usage attributed to a specific Skill invoked during a session. |
Background activities
Augment performs lightweight processing in the background to keep your experience smooth.| Activity | What it does |
|---|---|
| Context Compression | Summarizes conversation history to keep long sessions fast and responsive. |
| System | General background processing that helps Augment work smoothly. |
Tips for optimizing token usage
- Match the model to the task. Use Haiku for simple tasks, GPT-5.1 for medium tasks, Sonnet for complex work, Opus for the hardest challenges, GPT-5.4 for computer use and multi-agent orchestration, GPT-5.5 for the highest-complexity coding workflows, and Prism when you want Augment to balance quality and cost by routing within a curated model family.
- Be specific in your prompts. Clear, detailed instructions help models work more efficiently and use fewer output tokens.
- Lean on prompt caching. Augment automatically caches stable context (your repo index, AGENTS.md, recently used files). Cached input tokens are billed at the provider’s reduced cache rate — typically around 10% of the input rate — and the service fee scales with them.
- Break down large tasks. Sometimes splitting a complex task into smaller ones is more token-efficient than planning and executing everything in one session.
- Scope your Cosmos sessions. Compute is billed at $0.19/hour in 5-minute increments with no service fee. Closing a session when you’re done releases the compute and stops the meter at the next increment boundary.
- Review usage patterns. Check your dashboards regularly to identify optimization opportunities. Team admins can set guidelines on which models to use for which work.
Frequently asked questions
How is the cost of a single request calculated?
How is the cost of a single request calculated?
Each request is billed as the sum of input tokens × input rate, output tokens × output rate, cache reads × cache read rate, cache writes × cache write rate, plus a 40% service fee on the LLM total, plus Cosmos compute minutes × $0.19/hour if compute was used. Each request appears as a line item in your dashboard with the component breakdown.
What does the 40% service fee cover?
What does the 40% service fee cover?
The service fee covers the Context Engine — which hosts your codebase and dynamically indexes changes as you work — and the Cosmos platform that orchestrates your sessions, manages model fallback and concurrency, and runs the surrounding infrastructure.
Is there a service fee on compute?
Is there a service fee on compute?
No. Cosmos compute is billed at $0.19/hour with no service fee on top.
Does unused included usage roll over?
Does unused included usage roll over?
No. The $100 of included usage resets at the start of each billing cycle.
Is included usage prorated?
Is included usage prorated?
Yes. If you start a Business plan mid-cycle, included usage is prorated for the remainder of the billing period.
What happens when I use up my $100?
What happens when I use up my $100?
Billing continues automatically on pay-as-you-go at the same rates. There’s no need to top up manually, and there’s no minimum spend.
Can I switch models mid-conversation?
Can I switch models mid-conversation?
Yes. In Auggie CLI, switch models mid-conversation with the
/model command. Each message consumes tokens based on the model selected for that specific message.Where do I see live per-million-token rates?
Where do I see live per-million-token rates?
Visit app.augmentcode.com and open Usage → Models. The dashboard always shows current input, output, cache read, and cache write rates for every available model.
Related resources
Available Models
Learn about the different AI models available in Augment.
Pricing Plans
View current pricing plans and what’s included on each.
Teams Admin Guide
Manage team subscriptions and billing.
Usage Dashboards & Budgets
Monitor token consumption and set budgets across your organization.