Building the Reasoning Engine at Axiom

Mike's Notes

This is a very worthy aim. I hope they succeed. I discovered this article from reading This Week In AI Research from Dr Ashish Bamania.

"AxiomProver is an autonomous multi-agent ensemble theorem prover for Lean 4.21, developed by Axiom Math.

It autonomously and fully solved 12 out of 12 problems in Putnam 2025, the world’s hardest college-level math test, using the formal verification language Lean, 8 of which within the exam time.

A repository containing the solutions generated by AxiomProver can be found using this link.

A technical report will follow in the coming days, as per the team." - This Week In AI Research

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Subscriptions > This Week In AI Research
  • Home > Handbook > 

Last Updated

13/01/2026

Building the Reasoning Engine at Axiom

By: Axiom
Axiom: 24/05/2025

.

How hierarchical planning, verification, and self-play converge to mathematical superintelligence.

For centuries, mathematics has been humanity's most powerful sandbox – a place where we construct logical frameworks to understand reality's most complex systems. Why do grid cells fire in a hexagonal pattern? Why do quantum energy levels align with the spacing of primes?

Yet mathematical progress has always been shackled by a fundamental cruel bottleneck: the scarcity of extraordinary minds. When Évariste Galois revolutionized algebra in a fevered night before his fatal duel at twenty, he left behind ideas so far ahead of their time that decades passed before his contemporaries could grasp them. Srinivasa Ramanujan, before he succumbed to malnutrition and ill health, channeled thousands of formulas from his dreams into notebooks – results so profound that mathematicians spent nearly a century proving what he had intuited.

Even when breakthroughs occur as mathematicians' legacy, the extreme fragmentation of modern mathematics means experts in different subfields often cannot understand each other's work, causing vital connections between domains to remain hidden for decades. This combination of scarce genius and siloed knowledge creates an extraordinarily long pipeline from discovery to application – fundamental theorems discovered today might take generations before their full implications reshape technology and society – a delay so long that Hardy didn't anticipate in A Mathematician's Apology.

We're on a moonshot mission to change that. Axiom is building a reasoning engine capable of mathematical discoveries at scales and speed previously unimaginable – an AI mathematician.

The timing

Three trends are colliding for the first time in history.

First, neural networks have stepped beyond pattern matching into scalable reasoning, improving as compute, model size, and data grow.

Second, mathematical formalization has come of age through languages like Lean: by the Curry–Howard correspondence, proofs become executable programs, and programming languages are no longer just tools for producing outputs but instruments for certifying properties of abstract objects.

And lastly, LLMs have crossed a critical threshold in code generation, reliably producing high-quality code across many languages – including formal specification languages – and serving as a strong prior over the otherwise infinite action space of mathematics.

This synergy creates an unprecedented opportunity: reasoning engines that can conjecture and prove infinite theorems with zero human involvement.

The Convergence

Autoformalization is the Natural Language Compiler

Our data factory

Imagine you were a programmer in the 1950s. Your day to day was punching machine code into cards.

In the 1970s, you were wrestling with FORTRAN. By the 1990s, maybe C++. Today? You're basically just talking to the computer in English. Turing's childhood dream is now your reality with coding agents.

Each generation got to think a little less about the machine and a little more about the fun problem they were actually trying to solve.

Modern compilers are one-way DAGs. They take Python and transform it down the stack through multiple representations until it becomes machine code. There's some upward flow - as you type in your IDE, a partial compilation happens via Language Server Protocol, feeding information back up to create those squiggly lines and suggestions. But compilers never go fully back up the abstraction ladder. Disassembly exists, but it doesn't reconstruct your original high-level intent.

Mathematics needs something compilers never achieved: a true bidirectional cycle. For thousands of years, mathematicians think in high-level, intuitive leaps, not formal logic. Yet, math is undergoing a modernization. With proofs now spanning hundreds of pages, they are often riddled with hidden errors. In fact, every time proofs dogmatically resist being formally proven, the informal human source was wrong – just recently, mistakes were fixed during the formalization effort of Fermat's Last Theorem. The bottom formal layer catches what high-level intuition misses.

Meanwhile, autoformalization - the process of converting natural language proofs to Lean code - is a form of hierarchical planning, bridging between the abstraction layers.

Going down the stack: Autoformalization translates intuitive mathematical reasoning into formal proof – the compiler's traditional direction.

Going up the stack: Autoinformalization translates formal proofs back into human intuition – something compilers never truly do.

When combined, these create an interactive prover in natural language, freeing mathematicians to explore dozens of strategies and rapidly prototype ideas while the machine handles formalization.

But here's the most powerful part: machine discoveries at the formal level feed back up. The machine finds patterns, lemmas, and connections in the formal space that humans never intuited, then surfaces them as new high-level insights. The mathematician's intuition becomes augmented by discoveries happening in a space they couldn't naturally explore.

The compiler revolution took decades; the mathematical compiler revolution is happening now.

Formal Verification Guides Mathematical World Modeling

Our algorithmic core

You are a gold prospector in 1849. Everyone brings you shiny rocks claiming they've struck it rich.

Experienced prospectors examine them: "Looks like gold to me."

But when confronted with exotic ores, even experts disagree. Their pattern matching fails on things they've never seen.

Then someone brings an assayer's scale. The metal either dissolves in acid or it doesn't. Binary truth.

When you write a proof, it's either correct or it's not. Formal verifiers like Lean provide perfect ground truth while model judges are pattern matchers that fail when being pushed on generating genuinely novel proofs. From an engineering angle, verification gives us efficiently scalable rewards for learning.

And here's the philosophical perspective of why we need formal verification: Our self-consistent, observer-supporting universe follows rules that can be captured mathematically – from laws of physics to probability theory. Mathematics is the consistent language of our consistent universe and formal languages like Lean let us consistently operate in the mathematical world model.

We are training systems in mathematics as reality's minimal simulation – by learning to navigate the world of mathematics one grounded step at a time, the hope is that the AI has learned some fundamental rules that our reality has to follow. Video generation models learn physics too. Sometimes one ponders … where do abstract reasoning and spatial reasoning join?

Conjecturing-Proving Loop Realizes Self-Improving AI

Our discovery engine

While able to test if gold is real, finding new veins is harder: working in concert with verification, we enter the chapter of scientific discovery. Imagine you're in the middle of an ocean. Sailing towards new lands, of course, you start daydreaming about mathematics:

Your map is the Knowledge Base – showing where you've been. The entire corpus of mathematics indexed into a knowledge graph: definitions, theorems, and proofs. Formalized mathematics as a searchable, Internet-scale dataset.

Your ship is the Conjecturer – navigating uncharted territories. It spots distant landmasses through fog: "something valuable three days west." Built for open-ended exploration beyond known results, it samples out of distribution and generalizes with leaps guided by intrinsic motivations.

But when you spot an unknown island on the horizon, how do you know if it's India or the West Indies? The shape looks right, the distance seems plausible, but educated guess isn't certainty. You ask the experienced captain for wisdom that you trust – that is the Prover. Successful proofs extend the knowledge base. Failed attempts provide signals for improving both the Conjecturer and Prover. While formal verification turns "might be true" into "is true," counterexample construction shows "is false." Both grow the library.

The loop is self-reinforcing. More verified theorems mean a richer knowledge base. A richer knowledge base enables more sophisticated conjectures. More proof attempts (successful and failed) train better models. Better models generate more interesting conjectures and find proofs faster.

Axiom is building the AlphaGo for mathematics, but with infinite branching.

The Path Forward

The implications extend far beyond pure mathematics. Every complex system humans want to understand – from protein folding to quantum field theory and economic models – ultimately reduces to mathematical structures. A reasoning engine that can autonomously explore mathematical space and generate new theories doesn't just solve math problems; it provides a general-purpose tool for understanding reality.

Our founding team brings together precisely the expertise needed for this revolution. We were among the first to apply AI to compilers, bringing deep experience in programming languages and compiler technology. Our work spans from AI for mathematical discovery to pioneering self-improving systems. We're building reasoning engines that can operate in the mathematical world model at superhuman scale to tackle our most complex challenges.

The mathematical renaissance isn't coming. It's here.


AxiomProver at Putnam 2025

Putnam 2025, the world's hardest college-level math test, ended December 6th. By the end of the competition, AxiomProver had solved 8 out of 12 problems. In the following days, it solved the remaining 4. AxiomProver is an autonomous multi-agent ensemble theorem prover for Lean 4.21.0, developed by Axiom Math.

This repository contains the solutions generated by AxiomProver. Asterisk denotes solutions found after the competition.

  1. 2025 A1: [source], [graph]. Prover: 110 minutes, 7M tokens. Proof: 652 lines, 23 theorems, 561 tactics.
  2. 2025 A2: [source], [graph]. Prover: 185 minutes, 6M tokens. Proof: 556 lines, 26 theorems, 581 tactics.
  3. 2025 A3: [source], [graph]. Prover: 165 minutes, 8M tokens. Proof: 1,333 lines, 78 theorems, 1,701 tactics.
  4. 2025 A4: [source], [graph]. Prover: 107 minutes, 8M tokens. Proof: 960 lines, 32 theorems, 1,107 tactics.
  5. 2025 A5*: [source], [graph]. Prover: 518 minutes, 9.1M tokens. Proof: 2,054 lines, 52 theorems, 3,074 tactics.
  6. 2025 A6*: [source], [graph]. Prover: 259 minutes, 16M tokens. Proof: 588 lines, 28 theorems, 670 tactics.
  7. 2025 B1: [source], [graph]. Prover: 270 minutes, 7M tokens. Proof: 1,386 lines, 49 theorems, 1,841 tactics.
  8. 2025 B2: [source], [graph]. Prover: 65 minutes, 2M tokens. Proof: 417 lines, 28 theorems, 325 tactics.
  9. 2025 B3: [source], [graph]. Prover: 43 minutes, 2.9M tokens. Proof: 340 lines, 11 theorems, 422 tactics.
  10. 2025 B4*: [source], [graph]. Prover: 112 minutes, 249K tokens. Proof: 1,061 lines, 23 theorems, 1,433 tactics.
  11. 2025 B5: [source], [graph]. Prover: 354 minutes, 18M tokens. Proof: 1,495 lines, 66 theorems, 1,967 tactics.
  12. 2025 B6*: [source], [graph]. Prover: 494 minutes, 21M tokens. Proof: 1,019 lines, 30 theorems, 1,052 tactics.

No comments:

Post a Comment