Skip to main content

Overview

Augment uses token-based pricing. Instead of buying a pool of credits and learning a separate conversion table, you pay for the actual resources your work consumes. Every request is billed as the sum of three components:
  • LLM tokens at the model provider’s public API list price
  • Compute for time Cosmos spends running your work, at our published per-hour rate
  • A flat 40% service fee on LLM usage, which funds the Context Engine and the Cosmos platform
This gives you the flexibility to choose the right model for each task, with full transparency on what every request costs.

How tokens are billed

When you use Cosmos or Auggie CLI, three things can cost money:
  • LLM tokens — every model call uses input and output tokens. You’re billed at the model provider’s public API list price.
  • Service fee — Augment adds a 40% service fee on LLM usage. There is no service fee on compute. The service fee covers the Context Engine that hosts and dynamically indexes your codebase, and the Cosmos platform that runs your sessions.
  • Cosmos compute — when Cosmos runs work in a sandbox, you pay for the compute time it consumes at $0.19 per hour, billed in 5-minute increments (rounded up). A session that uses 7 minutes of compute is billed as 10 minutes; a session that uses 16 minutes is billed as 20 minutes.
All three components draw from the same usage balance.

The Business plan

ItemIncluded
Monthly price$100 / month
Included usage$100 across LLM tokens, compute, and service fees
Service fee40% of LLM usage (no fee on compute)
SeatsUp to 50 — no per-seat charge
Products includedCosmos and Auggie CLI
Cosmos compute rate$0.19 / hour, billed in 5-minute increments
Top-upsPay-as-you-go at the same rates once included usage is consumed
If you need more than 50 seats or custom Cosmos setup, see Pricing Plans for Enterprise options.

A typical month on the Business plan

Here’s how a developer might spend their $100 of included usage in a typical month:
ComponentAmountWhat it is
LLM tokens$60Input and output tokens at public API list price
Service fee$2440% of LLM token spend
Compute$16Cosmos compute at $0.19/hour
Total$100
Your actual mix depends on the models you choose, how much you use Cosmos, and the size of the tasks you run.

Top-ups

Once you’ve used your $100 of included usage, billing continues automatically on pay-as-you-go at the same rates: LLM tokens at public API list price, the 40% service fee on LLM, and compute at $0.19/hour. There’s no minimum top-up amount.

Token costs by model

Different models consume your usage balance at different rates based on their underlying provider pricing. All rates below are per million tokens, at the provider’s public API list price. GPT model rates are the standard (default) API tier.
ModelInputOutputCache readCache writeBest for
Claude Sonnet 4.6$3.00$15.00$0.30$3.75Latest generation Sonnet. Balanced capability, ideal for medium or large tasks and multi-step work.
Claude Sonnet 4.5$3.00$15.00$0.30$3.75Previous-generation Sonnet. Same pricing as 4.6; consider upgrading to 4.6 for the latest improvements.
Claude Opus 4.7$5.00$25.00$0.50$6.25Most capable model. Best for long-running tasks, deep reasoning, and opinionated code generation.
Claude Opus 4.6 / 4.5$5.00$25.00$0.50$6.25Previous-generation Opus models. Same pricing as 4.7.
Claude Fable 5$10.00$50.00$1.00$12.50Anthropic’s newest frontier model. Premium option for the most demanding reasoning and long-horizon agentic work.
Claude Haiku 4.5$1.00$5.00$0.10$1.25Lightweight, fast reasoning. Best for quick edits and small tasks.
Gemini 3.1 Pro$2.00$12.00$0.20$2.00Structural thinking, planning, debugging, and daily execution.
GPT-5.1$1.25$10.00$0.125$1.25Advanced reasoning and context. Good for medium-size tasks.
GPT-5.2$1.75$14.00$0.175$1.75Enhanced reasoning for complex tasks requiring long chains of thought.
GPT-5.4$2.50$15.00$0.25$2.50Advanced GPT generation. Smart and token-efficient. Standout computer use and multi-agent orchestration.
GPT-5.5$5.00$30.00$0.50$5.00Most advanced GPT option. Designed for highest-complexity coding and multi-agent orchestration.
Kimi K2.6$0.95$4.00$0.16$0.95Cheap agentic work. Smarter than Haiku at a fraction of frontier-model cost.
Prism (Claude + Gemini)Variable (routed)Variable (routed)Variable (routed)Variable (routed)Routes among Claude Opus 4.7, Claude Sonnet 4.6, and Gemini 3.0 Flash.
Prism (GPT + Kimi)Variable (routed)Variable (routed)Variable (routed)Variable (routed)Routes among GPT-5.5, GPT-5.4, and Kimi K2.6.
Per-million-token rates follow each provider’s public API list price and change over time, so this table is a snapshot. Your dashboard at app.augmentcode.com always shows the live per-million-token rates for input, output, cache reads, and cache writes.

Prism routed pricing

Prism uses variable pricing because each request is routed instead of billed against a fixed model rate. When you choose a Prism option, Augment selects the best-fit model from that family based on the task, context, and current system conditions, and the cost reflects the route used. Prism is designed to cost, on average, 20–30% less than frontier model costs, with minimal quality tradeoff. Exact savings depend on the task.

Example tasks

To illustrate how different models compare, here are examples of tasks and their approximate cost across each model. All costs include the 40% service fee on LLM usage and assume no Cosmos compute time.

Sonnet task: Fix a 500 error in an API endpoint

The /api/users/:id endpoint returns 500 errors when a user has no associated organization. Add null checking and return a 404 with a clear error message instead. Test that users with organizations still work correctly.
Cost: ~$0.25 with Sonnet 4.6 (or Sonnet 4.5)
ModelApprox. costNotes
Sonnet 4.6 / 4.5$0.25Baseline
Opus 4.7 / 4.6 / 4.5$0.43Use for harder tasks
Claude Fable 5$0.85Premium option — 2× Opus pricing
GPT-5.2$0.34Use for harder tasks
GPT-5.4$0.18$0.07 saved
GPT-5.5$0.36Use for highest-complexity tasks
Gemini 3.1 Pro$0.23$0.02 saved
Haiku 4.5$0.08$0.17 saved
GPT-5.1$0.19$0.06 saved
Kimi K2.6$0.13$0.12 saved
Prism (Claude + Gemini)Variable routed costSavings vary by route
Prism (GPT + Kimi)Variable routed costSavings vary by route

Opus task: Design and implement a multi-tenant billing system

Our B2B SaaS platform needs a multi-tenant billing system. Design the database schema, implement the metering service, integrate with Stripe’s usage-based billing API, and handle edge cases like mid-cycle plan changes, prorations, and failed payments. Consider how this will scale to 10k+ tenants.
Cost: ~$0.85 with Opus 4.7 (or Opus 4.6, Opus 4.5)
Costs are illustrative for a standard medium-complexity task and include the 40% service fee on LLM usage. Actual cost depends on the model you choose, how much repo context the request loads, and whether the task uses Cosmos compute.

Monitoring token usage

You can track token consumption from Auggie CLI or the web dashboard.

In Auggie CLI

Monitor usage per session directly from the CLI as you work.

On the web

Visit app.augmentcode.com for detailed dashboards that show:
  • Total usage by your team in dollars
  • Usage per team member
  • Breakdown by model, activity, and component (LLM, service fee, compute)
  • Usage trends over time
  • Remaining included usage and pay-as-you-go spend
Team administrators can access more detailed analytics to optimize usage and identify opportunities to save by using more efficient models for appropriate tasks.

Understanding your usage breakdown

Usage is organized by model, activity type, and component (LLM, service fee, or compute).

Session types

ActivityWhat it is
CliAgentCommand-line session (Auggie CLI)
CosmosSandboxed compute session running long-form work

Optional features

ActivityWhat it is
Prompt EnhancerToken usage from improving your prompts before sending them.
Code ReviewAutomated code review for pull requests (when enabled for your repository).
SkillsToken usage attributed to a specific Skill invoked during a session.

Background activities

Augment performs lightweight processing in the background to keep your experience smooth.
ActivityWhat it does
Context CompressionSummarizes conversation history to keep long sessions fast and responsive.
SystemGeneral background processing that helps Augment work smoothly.
These background activities use a small fraction of your total token consumption.

Tips for optimizing token usage

  1. Match the model to the task. Use Haiku for simple tasks, GPT-5.1 for medium tasks, Sonnet for complex work, Opus for the hardest challenges, GPT-5.4 for computer use and multi-agent orchestration, GPT-5.5 for the highest-complexity coding workflows, and Prism when you want Augment to balance quality and cost by routing within a curated model family.
  2. Be specific in your prompts. Clear, detailed instructions help models work more efficiently and use fewer output tokens.
  3. Lean on prompt caching. Augment automatically caches stable context (your repo index, AGENTS.md, recently used files). Cached input tokens are billed at the provider’s reduced cache rate — typically around 10% of the input rate — and the service fee scales with them.
  4. Break down large tasks. Sometimes splitting a complex task into smaller ones is more token-efficient than planning and executing everything in one session.
  5. Scope your Cosmos sessions. Compute is billed at $0.19/hour in 5-minute increments with no service fee. Closing a session when you’re done releases the compute and stops the meter at the next increment boundary.
  6. Review usage patterns. Check your dashboards regularly to identify optimization opportunities. Team admins can set guidelines on which models to use for which work.

Frequently asked questions

Each request is billed as the sum of input tokens × input rate, output tokens × output rate, cache reads × cache read rate, cache writes × cache write rate, plus a 40% service fee on the LLM total, plus Cosmos compute minutes × $0.19/hour if compute was used. Each request appears as a line item in your dashboard with the component breakdown.
The service fee covers the Context Engine — which hosts your codebase and dynamically indexes changes as you work — and the Cosmos platform that orchestrates your sessions, manages model fallback and concurrency, and runs the surrounding infrastructure.
No. Cosmos compute is billed at $0.19/hour with no service fee on top.
No. The $100 of included usage resets at the start of each billing cycle.
Yes. If you start a Business plan mid-cycle, included usage is prorated for the remainder of the billing period.
Billing continues automatically on pay-as-you-go at the same rates. There’s no need to top up manually, and there’s no minimum spend.
Yes. In Auggie CLI, switch models mid-conversation with the /model command. Each message consumes tokens based on the model selected for that specific message.
Visit app.augmentcode.com and open Usage → Models. The dashboard always shows current input, output, cache read, and cache write rates for every available model.

Available Models

Learn about the different AI models available in Augment.

Pricing Plans

View current pricing plans and what’s included on each.

Teams Admin Guide

Manage team subscriptions and billing.

Usage Dashboards & Budgets

Monitor token consumption and set budgets across your organization.