a litmus test for every prompt your agents run

Score every prompt before your agents run it.

Your agents run whatever you hand them, even a half-baked one-liner. Litmusify reads each prompt on-device in under a millisecond and pulls the shaky ones aside before they spiral into a thousand-call mess.

Leave your email and we'll start solving your burn.

▲sub-ms scoring·zero data retention·free for solo devs

litmusify-proxy agent.ts intercepting

litmus score

released

compute saved · session

$0.00

› ⏎ score

// where the money goes

One lazy prompt. A $15 loop.

Nobody selling you tokens has a reason to help you use fewer. One vague instruction fans out: your agent ransacks the repo, guesses wrong, and tries again. And again.

prompt ▸ "fix the authentication bug" → unconstrained

agent calls

tokens burned

cost · 20 min later

$0.00

with Litmusify: detoured before call #2 · $0.40, zero loop

Illustrative scenario.

// how it works

Score. Coach. Release.

No big model in the way. A small classifier on your machine reads each prompt in a millisecond, weighing your instruction and never your code. Your best engineers never feel it.

01score

Score, don't generate

Local classifiers read structure and specificity to score the prompt. No generative call, no waiting.

< 1 ms · on localhost

02detour

Soft-detour the bad ones

Below the bar, the prompt is paused and coached right in your IDE, not thrown as an error.

native chat UI

03release

Release the clean prompt

Tightened, it streams straight to the model. Litmusify trains the human as the work ships.

zero added latency

// architecture

A local proxy on localhost.

A tiny daemon on localhost catches each prompt right before it leaves for the network. Clean ones sail through; shaky ones loop back for a quick fix. Your code never goes anywhere.

IDE / agent

cursor · roo

litmusify · localhost:8080

score · <1ms

model provider

claude · gpt

⚠ ambiguous → detour back to IDE ✓ clean → released to model

raw prompt stays on localhost · only anonymized telemetry leaves the perimeter

// results

The number a CFO understands.

Your VP won't act on an average score. So we tie it to your merged PRs: the real compute cost of every shipped pull request, before and after.

COMPUTE PER MERGED PR▼ 73%

$45

$12

beforeafter Litmusify · same velocity

$45$12

Compute cost per merged PR

<1ms

Added latency. No LLM in the path

0bytes

Of your source code that ever leaves

Illustrative figures.

// developer incentive

Good prompts earn God Mode.

Nobody likes being audited, so we flipped it. Everyone starts on the cheap models. Write clean prompts and your litmus score climbs, unlocking higher limits and the best frontier models by default.

0/100✦ God Mode unlocked

0–40

haiku

base rate limit

40–70

sonnet

2× rate limit

70–90

opus

4× rate limit

90+

all frontier · god mode

unlimited

// your team

The leaderboard that rewards clarity.

It's not a race to burn the most tokens. We rank your team by how clearly they ask, tied straight to cost per PR. Clarity rises, waste sinks.

devlitmustier$/pr

1M@maya94god mode$9

2R@ravi88opus$11

3S@sam73sonnet$16

4A@alex54haiku$27

5J@jordan39haiku$41

Sample team data.

// the moat

It gets sharper with every prompt.

A gateway can count your tokens. It can't see what you meant. We learn from the one signal nobody else has: which prompts ended in a clean merge, and which spiralled. Every prompt sharpens the next call.

edge cases in the ledger0 ▲ 38% / quarter

01 synthetic bootstrap tens of thousands of simulated lazy-vs-clean prompts.

02 open-source honeypot a free local tool; solo devs opt in to anonymized telemetry.

03 enterprise ledger intent to outcome across real codebases. The specialized model.

Infra gateways clone an API proxy in weeks. They can't clone the data.

Illustrative figures.

Your machine / VPCclassifiers · embeddings · raw prompt

stays here

↓ anonymized telemetry only ↓

Litmusify dashboardscores · token counts · detour rate

// privacy by architecture

Your code never leaves the machine.

The classifiers, embeddings, and scoring all run on your machine or in your VPC. No one else reads a word, so there's nothing for your security team to sign off on.

Raw prompts are never logged centrally.
Only anonymized telemetry leaves the device.
Preserves your zero-retention vendor deal.

// faq

Questions, answered.

What engineers and their VPs ask before they let us near the agents.

No. Classifiers, embeddings, and scoring all run as a local proxy on your machine or in your VPC. Only anonymized telemetry ever syncs: the litmus score, token counts, and detour rate. Your raw prompts and source are never transmitted, so there's nothing for a security team to review.

Lightweight local models (think XGBoost and a small PyTorch classifier, not a frontier LLM) read structural signals: target specificity, action verbs, acceptance criteria, and vector distance from known-good prompts. Those features produce a probabilistic 0–100 score in well under a millisecond.

No. There's no LLM in the hot path. Local classifiers score the prompt in under a millisecond, so a well-formed prompt passes straight through with no perceptible latency.

Rarely. The score is an exponentially-weighted moving average with session inertia, so early exploratory turns barely move it and one-off shorthand on a senior's profile sails through. Only sustained ambiguity trips a detour, and even then it coaches rather than blocks.

Never a hard block. Litmusify uses a soft detour: it pauses an ambiguous prompt and returns a coaching note rendered natively in your IDE. Tighten the prompt and it's released instantly. No error codes, no broken flow.

Anything that speaks MCP or an OpenAI-compatible API: Cursor, Claude Code, Roo Code, Devin, Windsurf, and more, in front of any frontier model. It runs as a local proxy, so it sits in front of the agents without changing your setup.

About five minutes. Drop in the local daemon, point your IDE's base URL (or MCP server) at localhost, and it starts scoring. A file-watcher keeps the config pinned so nobody drifts off the proxy by accident.

The dashboard cross-references the anonymized telemetry with your version control to compute compute cost per merged PR, the one metric leadership can act on. Not a subjective "litmus score 74", just dollars per shipped PR, before and after.

Yes. The whole stack runs inside your VPC with no raw data leaving the perimeter, which is exactly what InfoSec wants to hear. Team and enterprise plans add the shared dashboard, SSO, and PR-level reporting. Free for solo developers.

// the teamwomen-founded

Built by people who've felt the burn.

Two women building the guardrails for the agentic era. We sat beside the engineers, watched the spend spiral from a single lazy prompt, and built the litmus test we wished we'd had.

Aditi

CEO & Co-founder

Product leader with 8+ years building 0→1 products across AI, edtech, and consumer. As a founding PM she shipped AI features to large user bases and rebuilt core data systems around large language models. That's where she watched unmanaged prompt quality turn into runaway compute spend that nobody owned. She started Litmusify to put a litmus test in front of every prompt, and leads its product and strategy.

Cassandra Mackin

CTO & Co-founder

CS master's candidate in AI at Georgia Tech, with a background spanning bioinformatics, linguistics, and UX design. Her work ranges from reasoning systems and ARC-style puzzles to state estimation with Kalman filters and path planning with A* search. Earlier she led UX across logistics, onboarding, and accessibility-focused products. At Litmusify she owns the engine: the local classifiers, embeddings, and loopback proxy that keep scoring fast and private.

Headshots coming soon.