Growing on me
What I learned testing GPT-5 against Claude and Gemini on real code
I was nearly a teenager before I saw a kiwi fruit for the first time, back in the mid 1980s. That first time I didn’t enjoy them much, but as time passed they grew on me.
I’ve been going through something similar with GPT-5. Sure, it’s got rough edges - and is a muddle of personalities. But as I’ve spent time with it, it’s growing on me.
Under the covers GPT-5 is actually four models:
'Fast' is the non-thinking model that returns quick answers. It’s the replacement for GPT-4o.
'Thinking' is the proper thinking model. It’s the replacement for o3.
'Thinking-mini' is the lightweight thinking model. It’s the replacement for GPT-4-mini - and what you get bumped to when you over-use 'Thinking'.
And, if you’re willing to spend $200 per month, there’s Thinking-Pro. Which is the best (and slowest) model.
Although sharing one name, these are different models. ChatGPT's 'router' automatically picks which model answers your question. But it’s easy to over-rule that - for example, if you get routed to Thinking, then you can click on 'Get a quick answer'…
Or add 'Think hard' to a prompt to invoke thinking (it’s something I’ve found myself typing a lot over the past week…)
While useful for novice users, OpenAI seem to have accepted the router isn’t for everyone and the model chooser has returned. As a 'Plus' user I have seven models plus 'Auto' (aka the router) available.
All the models are more serious than GPT-4o; the emojis are gone and the sycophancy turned down. It is this different personality that triggered the "I lost my only friend overnight" revolt which forced the return of GPT-4o. OpenAI seem to have a new problem on their hands if they can’t kill older models; it increasingly seems like personality is going to be a key differentiator in the consumer market. xAI have shown their hand firmly here with their 'companions' - Ani and Valentine.
Coding with GPT-5
But what’s GPT-5 like for coding? I thought I’d explore by getting them to write a better version of the steganography tool I mentioned last time. Our three candidates are Claude Opus 4.1, Gemini Pro and GPT-5 Thinking.
Let’s start by giving them the key requirements and getting them to write a combined spec & design.
I'd like to design an audio steganography tool that can take a bitmap image and encode it into an audio wave file, which can then be viewed using a spectrogram analyzer. The tool should be a command line tool written in Rust. It should use appropriate windowing functions to minimize leakage into other frequency bands and keep the image sharp. The image should be in the 500Hz - 10kHz range. The image should display clearly with a log scale of the FFT
Write a combined spec & design for such a tool. Think hard.
Gemini and Claude produced similar docs - best characterised as solid starts but lacking detail. Claude immediately jumped into code snippets without any high-level explanation or overview of the tool. It reminded me of the type of design EIC engineers might produce. Gemini’s was better - it covered all the bases and was a straightforward easy read. Both docs were around 1,000 words.
But GPT-5 went to town. It produced a 2,000 word detailed design (which it described as 'tight'). The tool had 19 command-line parameters (Gemini had five, Claude six). The design was complicated and often difficult to understand.
It felt… overengineered.
I gave all three designs to the three models to get their feedback. The consensus was the GPT-5 design had a better signal processing approach, albeit the UI was somewhat overengineered. Claude summed the designs up rather well:
GPT-5, on the other hand, had nothing but praise for its design:
Implementation
With designs complete, I got the models to implement their designs. All three successfully produced working Rust code after a small number of iterations. This is a significant improvement on earlier this year, where it could take tens of cycles to get Rust that compiled, never mind functioned properly. We’re not quite at the point where the tools can single shot complex working Rust, but we’re getting close.
Lines of code is a flawed measure, but it gives an insight. It’s probably not a surprise that while Gemini produced 210 lines - and Claude 401 lines - GPT-5 produced 791 lines. Interestingly this appears to be driven by the design: give GPT-5 Gemini’s design and GPT-5 produced 224 lines. Or give Claude the GPT-5 design - and Claude produced 600 lines.
Claude required three compile-fix cycles to get cleanly compiling code, Gemini and GPT-5 only two.
Looking at the code generated, I’m mixed. All models are poor at commenting - comments are cryptic and often raise more questions than they answer. GPT-5 was the only model to include a brief summary of the tool at the top of main.rs - a nice touch. But it’s incomplete.
//! echoglyph: turn a bitmap into a spectrogram picture (500 Hz–10 kHz by default). Subcommands:
//! - encode: image -> wav
//! - probe: dry-run, print derived grid/time
Gemini falls victim to the need to comment the fixes it made:
// FIX: Removed unused GenericImageView import. The compiler resolves the methods without it.
Comments shouldn’t document what the code once did - they should explain what it actually does.
More seriously, the Gemini implementation contains a bug which will cause horizontal banding through the image.
Seeing is believing
But what really matters is the end result. Of the three, Claude’s version had the best command line help:
With Gemini I had to read the code to understand how to run it while the GPT-5 version scrolled a page of options…
And here is the output (the input is the image from my last article).
Claude
Gemini
GPT-5 Thinking
I think the winner is clear. For all its cleverness the GPT-5 version is upside down and compressed. But, in fairness, all these tools are working blind. That the results are as good as they are is impressive.
And so
That we now have tools which can nearly one-shot 1 kloc of Rust code is amazing. Sure, it requires a bit of handholding. As with human development, getting the requirements, spec and design right is important. Foundations matter.
My gut feel is that a combination of GPT-5 as designer and Claude as implementor may work well. GPT-5 will need reining in - but it is often easier to spot what is unnecessary than work out what is missing. I’m growing to like it studiousness. It’s thoroughness. It is a useful addition to the kit bag.
As for me? I’m off to explore how to use GPT-5 alongside Claude Code. And perhaps treat myself to a kiwi fruit as well.










