An open-source AI just came shockingly close to winning the most brutal undergraduate math competition on the planet.
Nous Research, a San Francisco–based AI startup, announced Tuesday that its new model, Nomos 1, scored 87 out of 120 on the 2024 William Lowell Putnam Mathematical Competition. That score would have ranked second overall among nearly 4,000 human competitors.
The top human score this year was 90.
The median score was just 2.
In other words: this AI didn’t just do well. It outperformed almost everyone.
Why the Putnam matters
The Putnam is legendary for a reason.
It’s a six-hour exam split into two sessions. Just 12 questions. Each one is designed to break even the strongest math students. Most participants leave with single-digit scores. Perfect scores are almost unheard of.
Winning the Putnam has long been a signal of elite mathematical talent. Past top performers include Fields Medalists and Nobel Prize winners.
That’s what makes Nomos 1’s result stand out.
This wasn’t brute force AI
Nomos 1 isn’t a giant, power-hungry model.
It’s built on a 30-billion-parameter architecture, with only 3 billion parameters active at a time, using a mixture-of-experts design derived from Alibaba’s Qwen3 model.
That’s tiny compared to today’s frontier systems.
OpenAI’s most advanced reasoning models are estimated to exceed a trillion parameters. Google’s Gemini models reportedly run into the hundreds of billions.
Nomos 1 runs on a fraction of that compute.
And yet, it nearly won.
Training beat scale
Here’s the part that made researchers pause.
When Nous Research tested the same base model without its specialized training, it scored just 24 points on the Putnam.
After post-training optimization?
87 points. Eight perfect solutions.
The jump didn’t come from adding more parameters. It came from how the model was trained, guided, and evaluated.
That gap highlights a growing truth in AI research: raw size is no longer the main differentiator. Smarter training is.
How Nomos 1 actually solves problems
Nomos 1 doesn’t just spit out answers.
It runs a structured reasoning process that mirrors the actual competition.
First, multiple parallel workers attempt each problem simultaneously. They generate full written solutions. Each one self-scores its own confidence.
Harder problems get more attention.
As time runs out, the system enters a final selection phase. Submissions are grouped by conclusion. Then a tournament-style comparison selects the strongest proof — not necessarily the most common one.
All of this happens within the same time constraints as the real exam.
To validate the results, Nous had the solutions blind-graded by a human mathematician who previously ranked in the Putnam’s top 200. The company then released the anonymized submissions and tooling publicly.
How it compares to Big Tech
Nomos 1 isn’t the highest-scoring AI math system ever built.
DeepSeek’s latest model reportedly scored 118 out of 120 on Putnam problems. Google has shown Gemini generating full proofs under competition conditions.
But those systems rely on massive, closed infrastructure.
Nomos 1 is different.
It’s open-source.
It’s lightweight.
And it can run without hyperscale compute.
That tradeoff matters — especially for researchers, universities, and companies priced out of frontier AI.
Why this matters beyond math
Mathematical reasoning isn’t academic theater.
It’s foundational to:
- Scientific modeling
- Formal verification
- Cryptography
- Safety-critical systems
- Advanced engineering
For years, the best reasoning systems were locked behind proprietary APIs. Nomos 1 suggests that wall is starting to crack.
Anyone can now inspect, run, and build on a near-elite AI mathematician.
The bigger signal
Nomos 1 dropped just days after Nous released Hermes 4.3, another open model trained partly on a decentralized network.
Together, they point to the same idea.
The future of AI may not belong only to the biggest models.
It may belong to the smartest ones.
And if a laptop-scale AI can outperform nearly 4,000 elite math students, that future may arrive sooner than anyone expected.