Now saving tokens for developers worldwide

Cut your AI
coding costs by 40%

Trimli compresses every prompt before it reaches the model. Same quality responses, dramatically lower bills. Works silently with your existing tools.

trimli proxy running on :8765
$curl -X POST localhost:8765/v1/chat/completions ...

input tokens:  28,400 16,900
compression:  40.5% saved
strategies:  whitespace, dedup, intent-distill, reference-sub
cost delta:  -$0.029 this request

response:  streamed unchanged (238 tokens, 1.2s)
40%
average token savings
20K+
tokens per agentic request
59/59
accuracy tests passed
0
prompts stored
How it works
Install. Point. Save.
No SDK changes. No prompt modifications. Three steps, under a minute.
01

Install the extension

Search "Trimli AI" in VS Code Marketplace. A local proxy starts automatically on localhost:8765.

02

Point your AI tool

Set your tool's base URL to localhost:8765. For Claude Code, just enable forward proxy mode.

03

Watch savings grow

The status bar shows live savings. Click to open the full dashboard with per-request history and cost breakdown.

Compatibility
Works with your tools
Any tool that supports a custom base URL works automatically.
CC
Claude Code
Fully supported
Co
Continue
Fully supported
Cl
Cline
Fully supported
Cp
GitHub Copilot
Partial (LM API)
Cu
Cursor
Not supported
Ws
Windsurf
Not supported
Under the hood
6 compression strategies
Each request runs through a pipeline of strategies, cheapest first. No LLM calls needed for compression.

Whitespace normalize

Lossless cleanup of extra spaces, blank lines, and indentation noise.

3-8% savings
🔁

Deduplicate

Removes repeated sentences and paragraphs across conversation turns.

5-20% savings
🎯

Intent distill

Strips filler words from user queries while preserving the core ask.

10-30% savings
🔗

Reference substitute

Aliases repeated long strings (file paths, URLs, class names) with short refs.

10-25% savings
📜

History summarize

Compresses older conversation turns into decision summaries. No LLM needed.

30-50% savings

Context prune

Drops low-relevance messages when context window is nearly full.

20-40% savings
ROI Calculator
See what your team could save
Drag the sliders. Based on real compression benchmarks across 59 accuracy tests.
10
75
$0
estimated monthly savings
Pricing
Start free. Scale when ready.
Same optimization quality on every tier. Pro removes the daily cap.
Free
$0
Free forever
  • All 6 optimization strategies
  • LLMLingua-2 compression
  • 200K token savings / day
  • Basic usage dashboard
  • No account required
Get started
Enterprise
$30/seat/mo
For teams of 5+
  • Everything in Pro
  • Team shared context pools
  • SSO (Okta / Azure AD)
  • Audit logs + CFO reports
  • On-premise Docker / Helm
Contact sales
FAQ
Common questions
Does Trimli store my prompts or API keys?
No. The proxy optimizes messages in-flight and forgets them immediately. Nothing is logged, cached, or sent to any third party. Your API key is forwarded directly to the model provider. The optimizer runs entirely on your machine.
Does compression affect response quality?
No. We ran a 59-test accuracy suite comparing optimized vs. unoptimized responses using an LLM judge. Result: 46% average compression with zero quality degradation. The optimizer removes redundancy — filler, repeated context, stale history — not meaning.
How many tokens do developers actually use per day?
In agentic workflows (Claude Code, Cline), each request sends ~20,000 input tokens including the system prompt, file context, and conversation history. A developer making 75-150 requests/day consumes 1.5M-3M input tokens daily — that's $4.50-$45/day depending on model.
Can my team self-host it?
Yes. Enterprise includes Docker Compose and Helm chart deployments. The Python compression service, Redis, PostgreSQL, and Nginx are all containerized and ready to deploy in your VPC. Zero data leaves your network.
How is this different from token monitoring tools?
Most alternatives (Tokenlint, Claude Token Monitor, Copilot Token Tracker) show you how many tokens you use. They don't reduce them. Trimli actively compresses every prompt before it reaches the model. It's the difference between a fuel gauge and a fuel-efficient engine.
Does it work with streaming responses?
Yes. Trimli only modifies the input (your prompts). Streaming responses pass through completely unchanged.

Stop overpaying for AI tokens

Install in 30 seconds. Start saving immediately.