a litmus test for every prompt your agents run

Score every prompt before your agents run it.

Agentic tools turn one lazy prompt into a thousand-call hallucination loop. Litmusify intercepts on-device, scores it in under a millisecond, and detours the bad ones before they spend a cent.

Leave your email and we'll start solving your burn.
↓ or try the live demo
sub-ms scoring·zero data retention·cursor · claude code · roo code
litmusify-proxy agent.ts intercepting
·
litmus score
released
compute saved · session
$0.00
⏎ score

sits in front of the agents you already run

CursorClaude CodeRoo CodeDevinWindsurf
// the problem

One lazy prompt. A $15 loop.

Providers bill by the token. They have zero incentive to constrain your prompts. So one ambiguous instruction fans out into a brute-force search, a context flood, and a recursive failure loop.

prompt ▸ "fix the authentication bug" → unconstrained
agent calls
0
tokens burned
0
cost · 20 min later
$0.00
with Litmusify: detoured before call #2 · $0.40, zero loop

Illustrative scenario.

// how it works

Score. Coach. Release.

No frontier model in the hot path. Local classifiers read the prompt and decide in milliseconds. No latency, no token tax. Litmusify judges the human's instruction, then steps aside.

01score

Score, don't generate

Local classifiers read structure and specificity to score the prompt. No generative call, no waiting.

< 1 ms · on localhost
02detour

Soft-detour the bad ones

Below the bar, the prompt is paused and coached right in your IDE, not thrown as an error.

native chat UI
03release

Release the clean prompt

Tightened, it streams straight to the model. Litmusify trains the human as the work ships.

zero added latency
// architecture

A local proxy on localhost.

Litmusify runs as a loopback daemon that intercepts the final payload before it hits the network. Clean prompts pass straight through; ambiguous ones loop back for coaching. The raw prompt never leaves the machine.

IDE / agent
cursor · roo
litmusify · localhost:8080
score · <1ms
model provider
claude · gpt
⚠ ambiguous → detour back to IDE ✓ clean → released to model
raw prompt stays on localhost · only anonymized telemetry leaves the perimeter
// results

The number a CFO understands.

Litmusify cross-references telemetry with merged PRs, so the savings show up as dollars, not vibes. Not "litmus score 74." Real compute per shipped PR.

COMPUTE PER MERGED PR▼ 73%
$45
$12
beforeafter Litmusify · same velocity
$45$12
Compute cost per merged PR
<1ms
Added latency. No LLM in the path
0bytes
Of your source code that ever leaves

Illustrative figures.

// developer incentive

Good prompts earn God Mode.

Everyone starts on fast, cheap models. Clear the classifier without detours and your litmus score climbs, unlocking higher rate limits and the most powerful frontier models by default.

0/100✦ God Mode unlocked
0–40
haiku
base rate limit
40–70
sonnet
2× rate limit
70–90
opus
4× rate limit
90+
all frontier · god mode
unlimited
// the team

The leaderboard that rewards clarity.

Not a tokenmaxxing race. Litmusify ranks engineers by their litmus score and ties it straight to compute cost per PR, so clarity climbs and waste sinks.

devlitmustier$/pr
1M@maya94god mode$9
2R@ravi88opus$11
3S@sam73sonnet$16
4A@alex54haiku$27
5J@jordan39haiku$41

Sample team data.

// the moat

It gets sharper with every prompt.

Gateways see tokens; they can't see intent. Litmusify learns from the one signal nobody else has: which prompts led to clean merges, and which spiralled. That edge compounds.

edge cases in the ledger0 ▲ 38% / quarter
01 synthetic bootstrap tens of thousands of simulated lazy-vs-clean prompts.
02 open-source honeypot a free local tool; solo devs opt in to anonymized telemetry.
03 enterprise ledger intent to outcome across real codebases. The specialized model.

Infra gateways clone an API proxy in weeks. They can't clone the data.

Illustrative figures.

Your machine / VPCclassifiers · embeddings · raw prompt
stays here
↓  anonymized telemetry only  ↓
Litmusify dashboardscores · token counts · detour rate
// privacy by architecture

Your code never leaves the machine.

Classifiers, embeddings, and scoring run as a local proxy inside your machine or VPC. There's no third party reading your prompts, so there's nothing for InfoSec to review.

  • Raw prompts are never logged centrally.
  • Only anonymized telemetry leaves the device.
  • Preserves your zero-retention vendor deal.
// faq

Questions, answered.

Everything engineers and their VPs ask before dropping Litmusify in front of the agents. Click a question to expand.

No. Classifiers, embeddings, and scoring all run as a local proxy on your machine or in your VPC. Only anonymized telemetry ever syncs: the litmus score, token counts, and detour rate. Your raw prompts and source are never transmitted, so there's nothing for a security team to review.

Lightweight local models (think XGBoost and a small PyTorch classifier, not a frontier LLM) read structural signals: target specificity, action verbs, acceptance criteria, and vector distance from known-good prompts. Those features produce a probabilistic 0–100 score in well under a millisecond.

No. There's no LLM in the hot path. Local classifiers score the prompt in under a millisecond, so a well-formed prompt passes straight through with no perceptible latency.

Rarely. The score is an exponentially-weighted moving average with session inertia, so early exploratory turns barely move it and one-off shorthand on a senior's profile sails through. Only sustained ambiguity trips a detour, and even then it coaches rather than blocks.

Never a hard block. Litmusify uses a soft detour: it pauses an ambiguous prompt and returns a coaching note rendered natively in your IDE. Tighten the prompt and it's released instantly. No error codes, no broken flow.

Anything that speaks MCP or an OpenAI-compatible API: Cursor, Claude Code, Roo Code, Devin, Windsurf, and more, in front of any frontier model. It runs as a local proxy, so it sits in front of the agents without changing your setup.

About five minutes. Drop in the local daemon, point your IDE's base URL (or MCP server) at localhost, and it starts scoring. A file-watcher keeps the config pinned so nobody drifts off the proxy by accident.

The dashboard cross-references the anonymized telemetry with your version control to compute compute cost per merged PR, the one metric leadership can act on. Not a subjective "litmus score 74", just dollars per shipped PR, before and after.

Yes. The whole stack runs inside your VPC with no raw data leaving the perimeter, which is exactly what InfoSec wants to hear. Team and enterprise plans add the shared dashboard, SSO, and PR-level reporting. Free for solo developers.

// the team

Built by people who've felt the burn.

Two founders who watched agentic spend balloon from lazy prompts and decided to put a litmus test in front of it.

Aditi
CEO & Co-founder

Product leader with 8+ years building 0→1 products across AI, edtech, and consumer. As a founding PM she shipped AI features to large user bases and rebuilt core data systems around large language models. That's where she watched unmanaged prompt quality turn into runaway compute spend that nobody owned. She started Litmusify to put a litmus test in front of every prompt, and leads its product and strategy.

Cassandra Mackin
CTO & Co-founder

CS master's candidate in AI at Georgia Tech, with a background spanning bioinformatics, linguistics, and UX design. Her work ranges from reasoning systems and ARC-style puzzles to state estimation with Kalman filters and path planning with A* search. Earlier she led UX across logistics, onboarding, and accessibility-focused products. At Litmusify she owns the engine: the local classifiers, embeddings, and loopback proxy that keep scoring fast and private.

Headshots coming soon.

Stop paying for prompts that were never going to work.

Tell us where it's burning and we'll help you put it out. Drop your email, or reach a founder directly at aditi@litmusify.com.

Free for solo developers. We read every one.