6 canonical problems.
One closed-form problem per equation in the corpus, each with a board-sourced citation chain and a locally-verified numerical answer. The board is asked for the formula and the primary-source references — never for the number. The number comes from a real CAS (sympy + mpmath + python-flint).
The v5 protocol
Language models recall formulas reliably and cite primary sources cleanly. They do not do precise arithmetic — a pendulum period calculation that requires transcendentals can be off by parts in 10⁵ from run to run on the same prompt. The protocol therefore treats the board as a formula and citation source, and runs the actual math locally with deterministic tools. The board's best-effort value_attempt is still captured in the ledger as capability data on when LLMs can and can't do math, but it is never load-bearing.
The four checks
Symbolic equivalence
The board's formula must collapse to zero against the architect's local formula under a canonicalization pipeline — rational normalization, trig and power simplification, polynomial remainder, and a discriminating random-point evaluation that deliberately avoids branch cuts and singularities.
Property-based sampling
Hypothesis-style stratified sampling across orders of magnitude for every declared variable, with boundary cases injected (min, max, zero, unity, near-singularity) and a reproducible seed logged in the ledger. 200 samples in Phase 12; power analysis drives the number toward 10,000 in Phase 13+.
High-precision fragility
mpmath at 50 decimal places compared against sympy at 15 digits — this is explicitly labeled a within-sympy fragility check, not an independent cross-CAS check, because mpmath is sympy's own numerical backend.
python-flint interval
The genuinely orthogonal numeric engine: Arb-based ball arithmetic that shares no code with sympy. Phase 12 is best-effort with graceful skip; Phase 13+ makes it a hard requirement.