AMF prefix elimination + Spec V2 speculative decode.
One click to deploy. Zero data leaves your infrastructure.
95%
At 128K
99.9%
Hit rate
49%
Decode speedup
1 click
To deploy
Currently in private early access
Benchmarks available on request.
How it works
Two engines. One integration.
AMF Prefix Elimination
System prompts, context windows, and repeated prefixes are computed once and cached as KV snapshots. 100% deterministic. 99.9% hit rate over 1,000-run soak test.
Spec V2 Decode Acceleration
Clean draft-verify loop with dynamic k adaptation. 49% decode speedup. Output is bit-for-bit identical to baseline.
Zero-Data Architecture
Prompts, outputs, weights, and KV cache never leave your infrastructure. Only numerical metrics for billing. Isolation layer blocks all potential data leaks.
One-Click Deploy
Works with vLLM, SGLang, TensorRT-LLM, and llama.cpp. Auto-detects runtime and model family. Fully automated setup.
Real-Time Savings Dashboard
See exactly how much compute and cost you're saving. AMF hit rates, Spec V2 acceptance rates, TPS improvements — all in real time.
Air-Gapped Mode
Annual license key for fully offline operation. Zero network calls. Used by financial institutions, healthcare, and regulated industries.
Context scaling
Longer context, higher savings
Before vs after
The numbers speak
At 128K context window
Prefix compute
Decode TPS
Total compute
Data transmitted
Automated deployment
One click. Fully automated.
The Axropus app detects your runtime, pairs the right draft model, wraps your engine, and verifies output — all automatically.
Detecting runtime
Identifying model
Pairing draft model
Initializing AMF engine
Wrapping inference engine
Running verification
Live — saving up to 95% compute
Auto-detect
Runtime & model identified automatically
Zero config
No YAML, no ENV vars, no setup scripts
Verified output
Bit-for-bit match confirmed before going live
Zero-data architecture
Your data never leaves
Not encrypted-in-transit. Not anonymized. Never transmitted at all. An isolation layer inspects every outbound payload.
Stays on your infrastructure
- ✓All model weights
- ✓All prompts & responses
- ✓All KV cache data
- ✓All AMF prefix snapshots
- ✓All Spec V2 operations
- ✓Audit logs
Numbers only (for billing)
- #Token counts (for billing)
- #TPS measurements
- #AMF hit rate percentage
- #Spec V2 acceptance rate
- #GPU count & model family
- #SDK version
Tamper-evident audit log — every SDK operation is hash-chained and stored locally for your security team to review.
Savings calculator
How much will you save?
Monthly projection
Current monthly spend
$0
With Axropus
$0
Monthly savings
$0
0%Based on up to 95% compute reduction at 128K replay-heavy workloads
FAQ
Common questions
Ready to start saving?
Cut inference costs
by up to 95% today
3 lines of code. 5 minute deployment. Zero data leaves your infrastructure. Net-positive from day one.
founders@axropus.com — we reply in under 2 hours