Beer & o3?
Some partnerships need supervision
Bread and butter. Beer and crisps. Some things are just better together. The whole is more than the sum of the parts.
And after a couple of weeks of intensive work with o3, I've found myself wondering: is this another such pairing? How much of o3's magic comes from the combination of human and AI working in harmony? Do I still have value in this brave new world?
Pushing o3
Over the past few days I've been experimenting with pushing the envelope with o3. Sure, o3 can write lots of code really quickly and accurately. It’s amazing. But what about some of the gnarlier corners of software development? Performance. Memory leaks. Threading models.
The results were interesting. And much more uneven.
I got o3 to review the code I’ve been working on for potential improvements. Michael Abrash (who wrote the graphics engine for Quake) is well known for emphasizing that you must measure performance rather than making assumptions about what's slow. Would o3 follow his philosophy?
Err, no. Instead o3 made some assumptions and jumped to rewriting code.
Not a great start. As it turned out these changes made about a one percent improvement.
I added instrumentation. Which showed most of the CPU time was consumed in a regex (to find email addresses in the body of an email). o3’s solution? Precompile the regex. As you might expect that made precisely no difference.
It required a lot of discussion before I got o3 to propose the idea of replacing the regex with custom code to detect email addresses. Which o3 then wrote perfectly. And which nearly doubled the performance.
Memory leaks
My app uses COM. And COM is notorious for memory leaks. It’s often unclear what methods create new COM objects. But you, dear caller, are responsible for working this out and then ensuring things are freed. Ugh.
o3 was delighted to review the code (it wrote) for memory leaks. And soon found some. But it also missed a few significant ones as well. And once again it didn’t have a strategy for finding the leaks. In the end I got it to write a new class to wrap COM allocations/ deallocations and add logging to periodically monitor the currently allocated pool.
And then we had our first fight. Changing all the COM allocations/ deallocations to use the new class is tedious. And something that o3 would be good at. So I asked it to list the changes required:
Ah - it’s given me a complete list - that’s good. Except, on closer inspection, the list wasn’t complete:
I don’t want to review the method - I want you, o3, to do it. That’s what I’m paying you to do. But o3 was insistent on leaving some of the work as an ‘exercise for the reader’.
Threading
And then we came to threading. MAPI COM requires the Single-Threaded Apartment (STA) - all COM objects must be accessed from the same thread where they were created.
But modern C# primarily uses Tasks and async/await for asynchronous operations which run on top of a thread pool and run in MTA (Multi-threaded Apartment) model. It’s lovely, but incompatible with MAPI COM. Despite clearly knowing that MAPI COM needed to use STA, o3 was determined to use tasks and async/await. I had to put my foot down to get it to switch to using STA.
These experiments revealed both o3's incredible capabilities and its current limitations.
And so?
o3 is fantastic for getting going. For writing self-contained methods. For following well trodden paths. But it reminds me of a junior developer who can write beautiful code but hasn't yet been burned by race conditions or memory leaks. One who doesn't (yet) understand...
That’s where experience comes into play. Thirty years of battle scars mean I can spot the traps o3 is inadvertently creating for itself. I can gently (and sometimes forcibly) redirect o3. I can ensure o3 is focused on doing what o3 does well - creating reliable code amazingly quickly. We make a great team. We are better together.
But it’s not the case that all developers will benefit. If you lack experience, the relationship with o3 is going to look very different. Both you and o3 will share the same blind-spots. o3 won’t help you with your unknown unknowns. And that's dangerous - we're entering an era where inexperienced developers armed with o3 can create complex systems faster than ever before. Systems that work perfectly in testing but fail catastrophically in production.
For now, I'm one of the lucky ones: my battle scars give me an edge in this new world. But as o3 and its successors democratize software development, we may be creating a perfect storm: more developers, building more complex systems, faster than ever before, all without the hard-won wisdom that comes from years of things going wrong. That's not just a recipe for technical debt - it could turn into a blueprint for disaster.




I think you’re bang-on in this article. While I’ve been astonished at the productivity boost Claude has give me in my own programming, it would be very easy for me to underestimate (or modestly downplay!) my own experience and capabilities. In a previous article you commented that AI is making people who are already good at something even better - I think this is case in point. Not that it will always be so, but I think it will for sometime.
As another example of where the current generation of AIs fall down, I’ve spent too much time over the last week trying to get Claude to help me build a (fairly complex) USB app for the Pico using tinyusb. And it has sucked - Claude that is, as well as the overall experience. Tinyusb is not well documented, doesn’t have comprehensive examples, the API seems to have changed substantially over time, and I suspect the model tinyusb uses is a bit unusual. All of this seems to have combined to make it something that AI is poor at. Once I have got to grips with it I can give Claude clear instructions and get good code out.
On the other hand, when I was hitting a hang on one of my cores Claude, out of the blue from my perspective, suggested an alignment fault. I was a long way from considering this, but it turned out to be right.
A though on o3 refusing to do a bunch of “grunt-work” for you. I wonder if that is deliberate to push you (or an AI agent) towards a cheaper model for that.