Growing code
Agentic coding, context anxiety, and the shift from craft to cultivation
Software performance has always intrigued me.
Consider a task which takes a computer a minute to complete. That’s quite slow - enough to disrupt your flow. Make it 10x faster. Now it’s only 6 seconds. Better, but there’s still a little friction. Make it 10x faster again and we’re at 600 milliseconds. Our task is essentially instantaneous. We can keep making it faster but, as a human, 600ms ~= 60ms ~= 6ms. There’s no longer anything to usefully gain.
The other direction - making the task slower - is also interesting to ponder. A compilation that takes 10 minutes, or 100 minutes, or even 1,000 minutes would be very frustrating, but you could work around it. But 10,000 minutes? That’s 7 days. Or 70 days? Very quickly we get to the point where it is too slow to be useful.
Something similar feels like it’s playing out with AI. Five years ago we were in the useless territory. But then, a couple of years, we started to move into the genuinely useful window. And, increasingly, we seem to be moving sub-minute. It’s increasingly powerful, but still with limitations.
There’s still a way to go before we reach sub-second, but there’s no sign of a plateau. If anything things are accelerating. This week GPT-5 was joined by Claude Sonnet 4.5. And also Sora 2 - which is amazing. And troubling. In probably equal measures.
This week
I’ve spent the week building with Codex and Sonnet 4.5. Sonnet 4.5 is a good model. But Codex remains my favourite. For what I’m doing (building highly scalable, reliable, systems software in Rust), Codex is just smarter. Here’s an example of a code review from Sonnet 4.5:
If only that were true. There were several subtle bugs, one of which caused a SEGV. If you’re familiar with C, you’ll know that’s bad news - SEGVs are invariably fiddly and time consuming to fix. The worst ones can take days - sometimes weeks - to understand and fix.
My heart sank when I saw it; I expected I’d need to roll up my sleeves and help with the debugging. But when I asked Codex to investigate, it rolled out debugging tools (gdb and valgrind) - and then found and fixed the bug. All by itself. I just watched in wonder.
Wow.
Sonnet is, well, less thorough. And prone to short-cuts. Take this example where Claude Code is fixing up Rust compiler warnings about unused code. Rather than fix the underlying problem, it takes a shortcut - allowing unused code.
I wasn’t impressed.
Sonnet also struggles more when things get complex. I got it to build a simple 3D FPS; superficially it looked brilliant, but it got the move left/right inverted and then struggled to fix this. So turn left and rockets fire right. We had a back and forth on this where Sonnet told me:
Sadly not. In the end I had to describe an exact repo scenario and hint at what I thought might be wrong, before Sonnet found the bug.
I repeated my "write a short story" test with Sonnet 4.5, specifically looking at how temporally consistent the model is. Previous models have raised characters from the dead, or had them reflect on events that have yet to happen. Sonnet 4.5 still suffers from this - this time round the story involved someone agreeing to cook a special dinner on Friday evening that week. But the next few paragraphs described how they then spent three weeks preparing for it.
For building systems code, this kind of temporal incoherence matters. If the model can’t track a simple timeline in a story, can I trust its reasoning about async lifetimes or lock ordering? It’s a canary.
Which is not to say Sonnet 4.5 is a bad model. It’s not. It’s completely amazing. But it still has gaps. And, IMHO, it’s not the best coding model in the world.
Actually doing
Both models occasionally forget to actually do the work. Take this from Codex:
Or this from Sonnet:
There are other occasional odd artefacts - what happened to step 4 of this plan?
And sometimes they hit limits and back out all their work:
Hitting the limit
Speaking of limits…
A few weeks ago I suffered from context anxiety - a fear that the context window would run out before the task was completed. But, in reality, the models are good at picking up from where they left off. This has become just a minor irritation.
But my context anxiety has been replaced with usage anxiety. Codex (and Claude) both have a weekly cap on usage. And both warn you as it starts to draw near.
Until, finally, you get this.
Last week I ran out with ~10 hours to go. That’s OK. Unfortunately I’m learning how to scale myself, so this week I’m already at 55% usage and less than 2 days in.
Scaling yourself
There are two ways to scale yourself:
Run more instances of Codex/ Claude Code.
Ensure the tools are always busy - even when you are not around.
Initially I focused on the first, but there’s a limit. There’s only so much running around between sessions I can do. And when one gets stuck, I’ve got to decide between helping it and keeping the others moving. A classic management dilemma.
But more recently I’ve started pushing the latter more and more. There are lots of hours when I’m unavailable - if I can get Codex to work when I’m eating, or sleeping then that’s another way I can scale. And I’ve found creating a multistage plan and queuing up requests works pretty well. It goes like this:
Get Codex to produce a multistage plan for something which has lots of unrelated dev. Writing UTs - or fixing backlog bugs - is a good choice; building a large new component less so. I aim to end up with 10-20 stages.
Carefully review & iterate until I’m happy.
Write the plan to disk.
/compact
Then set off working through the plan - pointing Codex at the plan on disk.
Queue a whole load of “Great. Continue.” prompts.
Throw in the odd “Review progress against the plan, make any necessary adjustments and update the plan.”
And then some more “Great. Continue”.
And then leave it to get on. Invariably Codex will run out of context before finishing the task - and get stuck. But that’s OK. And with things like the Claude Agent SDK becoming available, it’s only a matter of time before we can wrap Codex with other models that can drive it continuously.
I find myself wondering if coding will become like growing plants? You come to tend the agents every day and see whether any new flowers have opened? Water this one, prune that one, transplant these seedlings into production. It’s a strange and wonderful shift - from crafting every line to cultivating growth.
And so?
It’s been another wild week in the world of AI. Agentic coding is amazing. Addictive. Productive. It’s increasingly clear the world of coding is changing like never before. It’s not impossible that I never need to write another line of code again. I’m not sure how I feel about that.
But whether I like it or not AI marches forward; how long until we get to the point where AI is so capable we no longer notice the capability improvements?











