2025-08-12 10:32:00
www.pcgamer.com
What to make of OpenAI’s latest GPT-5 chatbot? Let’s just say the reception from users has been sufficiently mixed to have OpenAI head honcho Sam Altman posting apologetically on X. And more than once. But one thing we can say for sure, the charts in the launch video were a bizarre mess that OpenAI has since attempted to tidy up, to mixed avail.
Most obviously, the claimed SWE-bench performance of GPT-5 versus older model shown on launch day was badly botched. The chart showed accuracy figures of 74.9% for ChatGPT 5, 69.1% for OpenAi o3 and 30.8% for GPT-4o.
Problem is, the bar graph heights were exactly the same for the latter two, giving the at-a-glance impression of total dominance for GPT-5 when in fact it is only marginally superior to OpenAI o3.
It’s a basic enough mistake that you have to wonder whether OpenAI used, well, GPT-5 itself make the charts and couldn’t be bothered to proof them. Later on in the video there’s another graph showing, not a little ironically, the deception rate of GPT-5. In this chart, it shows the bar graphs for GPT-5 and OpenAI o3.
GPT-5 scores a “coding deception” rate of 50%, OpenAI o3’s is 47.4%. But the bar for OpenAI o3 is rendered roughly three times higher than that of GPT-5. Now, you could recognise that a lower deception rate is better and make some kind of convoluted argument for therefore making OpenAI o3’s bar higher.
Apart from the fact that this approach still doesn’t account for the large discrepancy in bar height, the problem is that on the same slide OpenAI also shows stats for “CharXiv missing image”. And here the bars are accurately proportional to the percentage results, with the 9% for GPT-5 a tiny fraction of the height of the 86.7% for OpenAI o3.
Another wonky chart cooked up by AI? Some kind of subtle satire? Just lazy, sloppy work? The usual adage probably applies, so assuming conspiracy where mere incompetence will suffice is probably unjustified. But it certainly implies a level of complacency that squares with the overall sense of entitlement and lack of rigour and accountability that surrounds the AI industry at large.
OpenAI has since posted some updated charts on its website. The new deception rate chart certainly suggests that a mere mistake was made. The revised stats show GPT-5’s coding deception rate at 16.5%, which squares with the bar heights in the launch video.
However, while the bar height in the SWE-bench chart has also been corrected, OpenAI added a further disclaimer, pointing out that the figures were achieved using 477 tasks within the SWE-bench suite, not the full 500.
When we compare directly to Anthropic’s Claude Opus 4.1, which has achieved a score of 74.5% on SWE-bench Verified, GPT-5 appears to score slightly better at 74.9%. But where are these 23 missing problems? What are they? pic.twitter.com/F8BSZnxLZtAugust 11, 2025
That has lead some observers to question whether a few inconvenient tasks were left out in order to allow GPT-5 to hit 74.9% and thus edge marginally ahead of the 74.5% score racked up by Anthropic’s Claude Opus 4.1 model. Indeed, this raised the eyebrow of none other than Elon Musk, too.
Meanwhile, original launch video with the messed up charts is still there on OpenAI’s YouTube channel, implying OpenAI isn’t all that bothered. Whatever is going on, exactly, it’s all fairly unsightly. At best it’s unbecoming of an organisation that supposedly produces artificial “intelligence.” At worst, it’s thoroughly unnerving if managing the dangers AI and all our safety depends on these people.

Best gaming PC 2025
Take your gaming to the next level! The Redragon S101 RGB Backlit Gaming Keyboard is an Amazon’s Choice product that delivers incredible value. This all-in-one PC Gamer Value Kit includes a Programmable Backlit Gaming Mouse, perfect for competitive gaming or casual use.
With 46,015 ratings, an average of 4.6 out of 5 stars, and over 4K+ bought in the past month, this kit is trusted by gamers everywhere! Available now for just $39.99 on Amazon. Plus, act fast and snag an exclusive 15% off coupon – but hurry, this offer won’t last long!
Help Power Techcratic’s Future – Scan To Support
If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.