Axiomander: Robots on Rails

Axiomander: Robots on Rails

LLMs have made programming dramatically faster. The problem is that speed and correctness have always pulled in opposite directions: formal verification — the kind that gives you real guarantees, not just test coverage — has historically required specialised languages, separate toolchains, and months of learning. You could move fast, or you could be sure. Not both.

Axiomander is an attempt to close that gap. It is a verification pipeline for Python that proves properties about your code using plain assert statements — no new language, no decorators, no imports. You describe what should hold, your LLM writes the implementation, and Axiomander tells you with certainty whether the contracts hold. When they don’t, it can often tell you exactly why.

The result is vericoding: using LLMs to accelerate development while getting the correctness guarantees that were formerly too expensive for everyday use. The same workflow that delivers a verified HTTPS stack in F*, or proves authorisation properties at AWS in Cedar, but without leaving Python or learning a proof assistant.

Under python -O, every assert and every ghost snapshot is stripped. The verification cost stays in the development loop — zero runtime overhead in production.

How It Works

You and your LLM write contracts as assert statements. The MCP tool classifies them by position. Type annotations carry constraints — -> bool means the result must be 0 or 1, dict[str,int] means the collection length is non-negative. These constraints are injected automatically: no user-written isinstance checks, no decorators. The contracts subsume type checking through the operations they use.

LLMs are often surprisingly good at the tricky parts — loop invariants in particular, which are the assertions that tell the verifier what stays true on every iteration of a loop. Traditionally these required real expertise to write correctly. With an LLM doing the first pass, you can get the benefits of loop invariant checking without needing to understand the theory behind them. And best of all, you don’t have to trust the LLM to get it right — the theorem prover will tell you if it didn’t.

# This is the target pattern — dict subscript and .get() in expressions
# are currently opaque to the automated prover. Full dict verification
# is the next milestone. Simple arithmetic and list examples verify today.
def transfer(balances: dict[str, int], sender: str, receiver: str,
             amount: int) -> int:
    assert amount >= 0                              # precondition
    assert balances.get(sender, 0) >= amount         # precondition: funds available
    if __debug__:
        old_s = balances.get(sender, 0)              # ghost snapshot
        old_r = balances.get(receiver, 0)            # (stripped by python -O)
    balances[sender] -= amount
    balances[receiver] += amount
    result = 1
    assert balances[sender] >= 0                     # postcondition: no overdraft
    assert balances[sender] + balances[receiver] == old_s + old_r  # conservation
    return result

The pipeline translates assertions through a contract linter into a typed intermediate representation, generates IMP commands from the function body, and produces a Coq theorem. Level 1 (wp_prove + lia) clears ~80% of goals automatically. Level 2 (SMT via cvc4/z3) handles non-linear arithmetic — z3 is preferred for string and float theories. Level 3, an LLM oracle with an interactive step-by-step coqpyt loop, tackles the rest.

Ghost state — variables used only in assertions, wrapped in if __debug__: blocks — is detected automatically. The verifier excludes them from the heap, adds them as logical existentials in the Coq theorem, and the Python bytecode compiler eliminates them under -O. Dafny-style ghost state, zero keywords.

What We Can Prove Today

The pipeline handles a substantial subset of Python:

Predicates as reusable specifications. Define is_sorted(lst) once, use assert is_sorted(result) everywhere. Pure predicates inline directly; loop predicates carry semantic postconditions via implies(result == 1, property).

Quantified invariants. assert all(result[j] == j for j in range(i)) proves every element of a growing list is correct. The forall goes to the VCG; the SMT solver handles the exit condition.

Conservation laws. assert old_sender + old_receiver == new_sender + new_receiver — money is neither created nor destroyed. Ghost snapshots capture pre-state without runtime cost.

Branch-precise contracts. assert implies(val < lo, result == lo) says “if the value was too low, it was clamped to the lower bound.” Implication makes postconditions precise.

Loop predicates with postcondition inlining. A predicate with a while-loop body carries a semantic postcondition. At call sites, the linter substitutes arguments and inlines the property.

Try It

git clone https://github.com/scidonia/axiomander
cd axiomander
uv sync
eval $(opam env)
uv run pytest py/tests/ -v

Or wire it into your editor as an MCP tool. The server exposes check-file, check-function, verify-function, verify-changed, verify-impacted, explain-cache, and frame-report — all callable from any MCP-compatible client (Claude Desktop, Cursor, Windsurf).

We are hiring. If you want to build the gold standard for verification systems — combining Coq, SMT, LLMs, and vanilla Python — get in touch.