For whom the bell tolls
It tolls for thee, software engineer
When I was younger my life ran to the ring of a bell. School bells. Clock chimes. My alarm clock. When I worked in a supermarket there was the cashier bell: one ring for help, two for a supervisor, three for another cashier.
And over the past couple of weeks bells have made a comeback. Every time Codex completes a task it plays a little bell themed tune. I hear it when I’m working. I hear it when I’m eating. When I’m cooking, or brushing my teeth, or showering or watching TV. I hear it in my dreams.
Now I configured Codex to play this tune - I want to know when it has finished a task. And GPT-5 created the tune. But I realise I’m becoming a slave to Codex. Or perhaps an addict. Take your pick.
Codex continues to amaze. And terrify. I’ve now created ~100kloc in the past few weeks. In the hand-written code world that represents 10-50 years worth of effort (I had to check those numbers a few times). Sure, there have been long days. And not all the code is production ready, or hardened, or fully tested. But even if Codex turns out to be only 10x faster everything still changes.
I’m working on two large projects in parallel. On one, I’ve got provisioning load tests running. On the other, a SIP/RTP/media mixer stack is coming together. And in the background I fill my waiting time experimenting.
I’ve reimplemented Windows Solitaire in Rust using Windows-RS (5kloc). I’ve ported the first Windows application I ever wrote (a simple card game, 'Stop The Bus', written in Turbo Pascal for Windows) to Rust (3kloc). I’ve built a train simulator using three.js. I’m building a pipeline for building timelapse videos from a random jumble of photos. I’ve built a dungeon crawler - you are a piece of data trapped in Azure Blob Storage - you need to find your metadata and then escape through a maze of software to get your message to the outside world.
I’ve started to experience "context anxiety". Codex CLI displays the remaining context in the bottom right of the screen. Is there enough left to complete this 'turn'? Should I compact and risk losing some context? Or chance that it’ll work?
When it goes wrong this happens…
Fortunately Codex is mostly pretty good at recovering.
Then there’s Ctrl+C. Which for most apps is copy, but for Codex means 'close immediately'. Type it in the wrong window and sadness ensues.
gpt-5-codex
Last week OpenAI released a variant of GPT-5 specifically for Codex: gpt-5-codex. It’s a good model. It’s sharper than GPT-5. Thinks longer. A lot longer. Give it the right task and it’ll work for hours:
I have multiple of these running in parallel. My host machine finds itself very busy.
Occasionally gpt-5-codex has some strange output:
And sometimes it messes up:
Being an idiot, I hadn’t pushed for a while. Fortunately Codex was able to reconstruct the changes from the old context - and I had sufficient context free to make the changes. Lesson learnt - push frequently.
Sometimes it proposes timing from the old world.
Nah, we’re going to complete Stage 0 right now and get coding immediately. No need to wait.
Other times it’ll claim to need my help.
Nope. Codex you’re perfectly capable of finding, downloading and installing those headers.
And me?
My job has changed. I spend much of my time thinking about how to build things. What’s the high-level architecture. What components do we have? What are the interfaces between them? What is the UI like? How do we test? How do we build the test shims? I write docs then discuss my thinking with GPT-5 when I’m out in the hills in the morning. I’ve learnt to talk over GPT-5’s endless "You’re on the right track" waffle. But it is incredibly useful to be able to get it to go and look up information about protocols, interfaces, tools.
Some of my time goes on unblocking Codex. It can’t solve everything yet. An early version of Stop The Bus had some interesting visual corruption.
Despite multiple attempts Codex couldn’t solve this. So I dug out Resource Hacker and compared the bitmaps in the exe with the input bitmaps. And discovered the modern Windows resource compiler was corrupting the images. Armed with that clue, Codex went off and dumped the images from the binary. And proved the resource compiler was adding an 8 byte palette to the start of the red card bitmaps.
Sure, I provided the clue. But I was seriously impressed with the way Codex worked through the diagnosis. I don’t know many engineers who’d have been as thorough.
And so…
It’s clear we’re entering a new world.
First, working with AI agents becomes more and more like working with a team. A team I increasingly trust. I’ve managed many humans over the years; it was impossible for me to review all the work they produced, so I relied on trust. And a sense of smell. And it’s the same with Codex; every time Codex admits it screwed up, every time it impresses me with an insightful analysis, my trust goes up a bit.
Ultimately I look at it in two ways:
If Codex enables me to replace a legacy product with something that has 10x less code in it, then I can afford to have 10x more bugs per kloc and the quality remains the same (OK, not all bugs are the same, but to a first approximation…). And I’m not convinced the Codex code produces is much worse than human code.
Secondly, we’ve all come to trust compilers. We no longer review the assembly they produce. Something similar feels likely to happen with AI. As I’ve said before if it walks like a duck and quacks like a duck then is it a duck?
Second, reflect on the progress of the past 12 months. A year ago, pre-reasoning models, it was a struggle to get an AI model to write a few hundred lines of Rust. In a year - just a year - AI is now able to write thousands of lines of Rust. And it’s decent code. Where will we be in a year’s time?
Consider the existential risk for incumbents. Large software houses struggling to embrace AI. When I’ve got a tank it doesn’t matter how many clubmen you’ve got - I’m going to win.
And what’s going to happen to all the software engineers we’ve currently got? Do we need 10x or 100x more code? Plus building with Codex is a different skill from writing by hand - folk need to learn and retrain. Consider the railways - much of the railways in the UK were built by hand, by armies of men, the navvies. That job no longer exists. It’s been replaced by a much smaller number of skilled roles - digger operator, dumper truck driver. Is something similar about to happen to software engineering?
I find myself mixed. It’s exciting being able to realise the projects I’ve had in my head for years. But the change our industry is about to undergo is terrifying…
Fortunately the Codex bell is ringing again so I can distract myself. For now, at least…











Not at all convinced I agree with this: “If Codex enables me to replace a legacy product with something that has 10x less code in it, then I can afford to have 10x more bugs per kloc and the quality remains the same”. It depends why you’re able to cut the code down so drastically. If it’s primarily because the old codebase was full of dead code that never got executed or legacy features we can live without now, you need to have stripped out bugs at the same rate you stripped out code, so your bugs-per-kloc ratio needs to be about the same to maintain quality.