OpenAI on achieving IMO Gold with an LLM

Jul 20, 2025

Core takeaways

An experimental reasoning LLM achieved gold medal-level performance on the 2025 International Math Olympiad (IMO).
The model scored 35/42, solving 5 of the 6 problems presented.
It operated under the same rules as human contestants: two 4.5-hour sessions without access to tools or the internet.
Submissions consisted of multi-page, natural language proofs that were graded by former IMO medalists.

Significance of the achievement

Described as a potential "moon-landing moment" for AI and the solving of a longstanding grand challenge.
IMO problems require sustained creative thinking, progressing the reasoning time horizon from minutes (AIME) to ~100 minutes.
The model produced creative proofs for novel math problems at a level only reached by elite pre-college students.
Success in a hard-to-verify domain like proof-writing marks a step beyond tasks with easily verifiable rewards.

The technical approach

The achievement came from a general-purpose reasoning LLM, not a model specifically designed for the IMO.
It utilizes new techniques in general-purpose reinforcement learning and scaling test-time compute.
The model is capable of thinking for hours on a problem, a significant increase over previous models that thought for seconds or minutes.
It functions without any tools, using only natural language to reason and generate proofs.

The pace of AI advancement

This result far outpaced a 2021 forecast that predicted AI would only achieve 30% on the MATH benchmark by July 2025.
The field has rapidly progressed from grade school math (GSM8K), to the high school MATH benchmark, to AIME, and now to IMO gold.
The leap from achieving 12% on the AIME benchmark (with GPT-4o) to IMO gold took roughly 15 months.

What comes next

The IMO-level model is an experimental research model and is not planned for release for several months.
The upcoming release of GPT-5 is a separate development from this specialized research model.
Researchers believe AI is now close to being able to contribute to original mathematical research and scientific discovery.
The general research advancements from this project will be applied to improve other capabilities in products like ChatGPT.

Notes from:

marvin's notes