Proven on production workloads

AMF prefix elimination + Spec V2 speculative decode.One click to deploy. Zero data leaves your infrastructure.

95%

At 128K

99.9%

Hit rate

49%

Decode speedup

1 click

To deploy

Currently in private early access

Benchmarks available on request.

How it works

Two engines. One integration.

AMF Prefix Elimination

System prompts, context windows, and repeated prefixes are computed once and cached as KV snapshots. 100% deterministic. 99.9% hit rate over 1,000-run soak test.

Eliminates all redundant prefix computation

Spec V2 Decode Acceleration

Clean draft-verify loop with dynamic k adaptation. 49% decode speedup. Output is bit-for-bit identical to baseline.

105 TPS vs 70 TPS baseline

Zero-Data Architecture

Prompts, outputs, weights, and KV cache never leave your infrastructure. Only numerical metrics for billing. Isolation layer blocks all potential data leaks.

Not encrypted-in-transit. Never transmitted.

One-Click Deploy

Works with vLLM, SGLang, TensorRT-LLM, and llama.cpp. Auto-detects runtime and model family. Fully automated setup.

Automated deployment pipeline

Real-Time Savings Dashboard

See exactly how much compute and cost you're saving. AMF hit rates, Spec V2 acceptance rates, TPS improvements — all in real time.

Local dashboard on port 8470

Air-Gapped Mode

Annual license key for fully offline operation. Zero network calls. Used by financial institutions, healthcare, and regulated industries.

For maximum security requirements

Context scaling

Longer context, higher savings

Context length
Savings
Evidence
4K tokens
52%
Measured
32K tokens
~80%
Estimated from measured anchors
128K tokens
95%
MeasuredMeasured, 1,000 runs
500K tokens
97-98%
Projected

Before vs after

The numbers speak

At 128K context window

Prefix compute

Baseline
100%
Axropus
0%

Decode TPS

Baseline
70 TPS
Axropus
105 TPS

Total compute

Baseline
100%
Axropus
5%

Data transmitted

Baseline
100%
Axropus
Zero

Automated deployment

One click. Fully automated.

The Axropus app detects your runtime, pairs the right draft model, wraps your engine, and verifies output — all automatically.

Axropus Deploy
Ready to deploy0%
🔍

Detecting runtime

🧠

Identifying model

Pairing draft model

📦

Initializing AMF engine

🔌

Wrapping inference engine

Running verification

🚀

Live — saving up to 95% compute

🔍

Auto-detect

Runtime & model identified automatically

🔄

Zero config

No YAML, no ENV vars, no setup scripts

Verified output

Bit-for-bit match confirmed before going live

Zero-data architecture

Your data never leaves

Not encrypted-in-transit. Not anonymized. Never transmitted at all. An isolation layer inspects every outbound payload.

🔒

Stays on your infrastructure

  • All model weights
  • All prompts & responses
  • All KV cache data
  • All AMF prefix snapshots
  • All Spec V2 operations
  • Audit logs
📊

Numbers only (for billing)

  • #Token counts (for billing)
  • #TPS measurements
  • #AMF hit rate percentage
  • #Spec V2 acceptance rate
  • #GPU count & model family
  • #SDK version

Tamper-evident audit log — every SDK operation is hash-chained and stored locally for your security team to review.

Savings calculator

How much will you save?

8
164
$3
$1$10
70%
10%100%

Monthly projection

Current monthly spend

$0

With Axropus

$0

Monthly savings

$0

0%
0%

Based on up to 95% compute reduction at 128K replay-heavy workloads

FAQ

Common questions

Ready to start saving?

Cut inference costs
by up to 95% today

3 lines of code. 5 minute deployment. Zero data leaves your infrastructure. Net-positive from day one.

founders@axropus.com — we reply in under 2 hours