Fiona Jackson
2025-07-21 16:39:00
www.techrepublic.com
OpenAI’s latest model has achieved a gold-level score on the 2025 International Math Olympiad. It answered five out of the six questions under exam conditions, scoring 35 out of a possible 42 points.
The International Math Olympiad is known to be the most prestigious and challenging mathematics competition for high school students in the world. Only about 10% of this year’s competitors received gold medals, and numerous Fields Medalists have won it in the past. Each competitor has two 4.5-hour sessions to complete the six questions without access to the internet or any tools.
AI models’ mixed success at solving math problems
Artificial intelligence models are not known to excel at complex mathematical problems because they can struggle to understand logic. And yet, recently, Gemini 2.5 Pro and OpenAI’s o3 scored 86.7% and 88.9%, respectively, in the American Invitational Mathematics Examination, a key math benchmark for AI models. In contrast, in September 2024, o1 scored 83% in just a qualifying exam for the International Olympiad. And, Grok 4 reportedly got a perfect 100% on AIME (math olympiad problems).
“IMO problems demand a new level of sustained creative thinking compared to past benchmarks,” OpenAI researcher Alexander Wei posted on X after announcing the unreleased model’s milestone. His colleague, Noam Brown, said that just last year, AI labs were using grade school math as a benchmark, referring to the GSM8K test.
OpenAI CEO Sam Altman said the experimental model was “an LLM doing math and not a specific formal math system” like AlphaGeometry, indicating that the company is well on its way to achieving general intelligence.
Manon Bischoff, an editor at the German-language version of Scientific American, predicted in January 2024 that it would be “a few years” before AI models could conceivably compete in the International Math Olympiad; however, AI models are improving quickly. At the time, Bischoff was announcing the release of the math-specific model AlphaGeometry, which could solve 54% of all the geometry questions included in the competition over the last 25 years. By February, a second-generation version could solve 84% of them.
Questions arise about OpenAI’s gold medal at IMO
Not everyone is convinced of OpenAI’s leaps and bounds in mathematical capabilities.
According to Google DeepMind researcher Thang Luong and OpenAI’s former CTO Mikhail Samin, OpenAI’s model was not graded based on the International Math Olympiad’s official guidelines, and thus its claims to be a gold medallist are not verifiable. Wei said on X that “three former IMO medalists independently graded the model’s submitted proof” and reached “unanimous consensus” on their scores.
OpenAI doesn’t have the strongest reputation when it comes to benchmarking the mathematical ability of its models. In April, Epoch AI, the independent research institute behind the FrontierMath benchmark, found that the o3 model could correctly answer only about 10% of the advanced problems, a steep decline from the over 25% accuracy originally claimed by OpenAI in December 2024.
It will be difficult for anyone to conduct the same level of independent verification on the experimental model that took part in the Olympiad until it is released. Unfortunately, Wei confirmed that OpenAI does not “plan to release anything with this level of math capability for several months,” and as GPT-5 is coming “soon,” it’s unlikely that this experimental system will be part of that release.
Mathematical ability is clearly an important quality for OpenAI. Last month, it released the o3-pro model, which it dubbed its most intelligent yet.
Keep your entertainment at your fingertips with the Amazon Fire TV Stick 4K! Enjoy streaming in 4K Ultra HD with access to top services like Netflix, Prime Video, Disney+, and more. With an easy-to-use interface and voice remote, it’s the ultimate streaming device, now at only $21.99 — that’s 56% off!
With a 4.7/5-star rating from 43,582 reviews and 10K+ bought in the past month, it’s a top choice for home entertainment! Buy Now for $21.99 on Amazon!
Help Power Techcratic’s Future – Scan To Support
If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.