The hype doesn't match
GPT-5 feels more like an incremental update than the promised revolution
The hype for GPT-5 has been building over the past few years. Even as late as November last year, Sam Altman was predicting that AGI (Artificial General Intelligence) would come in 2025 - with OpenAI defining AGI as "AI that is capable of doing all of the work of an organization independently."
Then last Thursday he tweeted an image of the Death Star.
And then we got GPT-5; the launch described it "like having a team of PhD-level experts in your pocket."
So does GPT-5 match the hype?
Well, no, not really. The general consensus is mixed. It’s a good model, but the days of OpenAI being the clear leader seem over. The biggest changes are in usability - OpenAI has added a router in front of their previous models so you no longer need to decide which model to use - the router decides based on your question. You can also choose a personality:
And GPT-5 thinking seems a step up from o3. But we’re not in GPT-4 or o1 territory. Those models were significantly better than what had gone before. It seems unlikely your organization would survive long if you exclusively used GPT-5.
The world has noticed too. Confidence has significantly dropped in OpenAI having the best AI model end by the end of 2025.
Sam Altman has been in damage control mode since the launch. GPT-4o has made a comeback.
And then there was this chart from the launch:
Spotted the error? Yup, the blocks for o3 and GPT-4o are the same height, but have different values. It looks like someone misdrew the o3 block. Could GPT-5 have spotted this error for OpenAI? Let’s find out:
Tell GPT-5 to "think harder" (the new way to force it to use extended thinking) and it still doesn’t spot the error:
Even when you ask explicitly about the o3 block, it still misses:
Not that Claude Opus 4.1 is much better:
Give it a clue and it still doesn’t spot it.
But at least it notices if I steer it more explicitly…
So the models turn out to be poor at spotting basic errors in the charts.
Over-confidence?
There’s a thing known as the confidence heuristic where confident delivery substitutes for actual knowledge. If you’re not a domain expert then someone who talks confidently about a particular area can easily appear to be an expert. Simply put, you lack the ability to spot the imposter.
But when you’re a domain expert it becomes far easier to spot imposters. And I find that time and again with AI - in areas I’m familiar with I can spot the things that seem off, ask questions to flush out errors and steer the conversation to get what I need.
The danger comes in the areas where I’m not expert. AI appears magical, but can so often be wrong.
Take our kitchen. We’re remodeling it. So we asked Claude for advice on paint. And Claude provided advice - loads of it. It suggested a stronger color than we’d normally have gone for. And Claude made a compelling case for the stronger color. So, after using a paint sampler, we decided to go for it. And what a mistake that turned out to be. We spent last Saturday repainting the kitchen…
But returning to the charts. The challenge AI faces with images like those (and things like the picture with multiple Place Royales) is that we humans are expert in spotting those types of errors. And AI isn’t. So it’s easy for us to notice when AI fails.
And so?
It’s interesting to see that after a few years of incredible progress, OpenAI seem to be struggling. Perhaps that’s not surprising - OpenAI have had a torrid year. All the original founders (bar Sam Altman) have left in the past year, Meta has been poaching top researchers with multi-million dollar salaries, there are questions over transparency and there is a continuing fight with OpenAI trying to wriggle out of the original non-profit structure.
Time will the lack of progress is OpenAI specific or whether the days of easy gains are over. We’ll know more in coming months as newer versions of Claude and Gemini are rolled out.
But for now, though, we still need to keep honing our "bullshit" detection skills. GPT-5 is just as likely as previous models to seduce us into believing it is an expert. And make us paint our kitchens colors we don’t like.












Man takes decorating advice from AI, has to repaint kitchen. ROFLMAO