The old rules don't work
And why three's a crowd when you bring agents into the mix
One afternoon, around 20 years ago, my phone rang. "Great news," began the voice on the other end. "I’ve sold another million seats. All you need to do is implement these easy features and have them in production in four weeks time."
At the time I ran a development team; we sold our software to company X, who then resold it to customers throughout Europe. It was not an easy relationship; company X were somewhat dis-functional. As an example, a few months later this same person would raise an urgent support call to report a total outage; we soon discovered company X’s operations team had disconnected all the kit used to run the service and put it in the back of a van which was now halfway around the M25 on its way to be scrapped. Oops. But I digress.
Despite the epithet, this was far from "great news". It wasn’t clear what the "easy" features should do. It was obvious some would conflict with existing features. Others didn’t even seem possible. What was clear was that they’d take far longer than four weeks to implement. But that wasn’t going to get in the way of our customer. Having signed their biggest deal they weren’t going to give up without a fight. And so ensued several weeks of shouting until, inevitably, the laws of physics exerted themselves.
I didn’t realise it at the time, but this was one of my earliest experiences of vibe coding: try to convert vague requirements into a product through increasingly passionate shouting before ultimately being disappointed with the results.
The three types of software development
Currently, software development can be divided into three categories:
There’s the old-world. Hand-written code, following some variant of the traditional req→spec→archictecture→HLD→LLD→implementation→code-review→testing flow.
There’s AI-assisted. AI auto-complete is used to help write chunks of code under close human supervision. The traditional software flow is largely unchanged other than some sections getting a speed up.
Then there’s agentic AI. This is the world that Codex & Claude Code are creating. It’s a different way of developing software. And right now it’s a wild west. There aren’t any accepted best-practice development flows; at one end there’s extreme vibe coding; at the other variants of established flows.
Agentic wild-west
This agentic world is seductive. It offers the tantalizing prospect of software commoditization. A world where it’s trivial to create software. Where we can build customized software to solve specific problems.
And it’s already happening - Claude and ChatGPT write code all the time to solve problems. Claude artefacts allow anyone to build small interactive websites to solve problems. I’ve built hundreds from interactive histories of Canada, to quizzes, to simple time-tracking tools.
These simple apps can be vibe coded successfully. But there’s a big difference between asking a model to implement Tetris and building an operating system. The former works well today. The latter? It doesn’t matter how much you shout, it’s not going to work.
Nor will it ever work. Because, while the requirements for Tetris are well understood, the requirements for an operating system need thought. As ever, working out what to build remains critical. Clear requirements matter just as much as ever. That’s the first problem.
But after that? What does the new world of software development look like?
The second problem is you are only as fast as the slowest stage.
One of the current big debates is about code-review. Should all AI generated code be reviewed by hand?
Maybe. But let’s run the numbers.
The best code-reviewers can manage ~2.5kloc per week. Or ~100kloc per year. In the past month I’ve created ~300kloc. So that’s 3 years worth of code review. In a month. Code review is now 40x slower than creating code.
That’s clearly not going to work. But before we throw our hands up, let’s ask: how good is human code review anyway? Anyone who has written code professionally knows review quality is variable: talent, focus, tiredness, time pressure - they all play a part. We stick with it, because, up to now, it’s been an effective way of finding bugs. But what if there are other ways of finding those bugs?
First we can repeat tasks in ways that humans would object to. Normally a design might get 2 or 3 review cycles. But I’ve had Claude spend a day reviewing and re-reviewing a design. Tens of times. Claude didn’t mind, and it was still finding useful things after the first few hours. No human would be willing - could - review a design so many times.
Similarly code-review is no longer a one-shot deal; a swarm of Codex agents review my codebases every night.
And then there’s testing. Most codebases have a suite of tests that get run overnight. And often fail. I know all too well the joy of a day spent fixing the overnight tests… …only for them to fail again the following night. But now AI can not only run the tests overnight, it can also fix them. Every morning I arrive to a codebase where all the test pass. It’s rather nice.
But it gets better. AI makes adding tests cheap. Really cheap. I have agents scouring my codebase every night to add more tests. I’ve tied it to code-coverage so the agents have a target to work against. I’m using Rust on Linux - there’s a choice of tarpaulin or llvm-cov for coverage. In the human world I’d have had to decide which to focus on first; in the AI world I just choose both.
Could this become a new version of CI/CD - maybe Swarm Improvement?
And the end result? A better tested codebase than I’ve ever experienced before. One where I’m increasingly comfortable not reviewing every - or any - line of code myself.
The third problem is finding fracture lines.
In the old world we had rules of thumbs about how many developers we needed for a project. In this new world everything is, well, smaller. Me and two other engineers have been working on a 150kloc project. Or trying to. Because we keep tripping over each other. Trying to merge our monster check-ins has become increasingly painful.
I briefly experimented with Git worktrees to enable me to work on multiple changes in parallel. But I quickly realised they made the problem worse - now I had to contend with merges between my changes as well as the ones the other engineers were making. Even more merge fun.
Each AI agent has the potential to be a >10x engineer who works 24x7. Run multiple of those in parallel and your single human engineer is suddenly equivalent (at least from a code generation pov) to a hundred humans. Now, maybe you can’t achieve that throughput continuously. But even if they are only 10x faster you’ve still got a problem.
And so?
So what’s changed since that phone call?
Clear requirements still matter—that’s physics. But three engineers now trip over each other where we once needed ten. Code review became a 40x bottleneck, so I stopped doing it. Tests fix themselves overnight. Git worktrees made merge conflicts worse, not better.
The old rules don’t work. But I don’t know what the new ones are yet.
Would this have helped company X? They still didn’t know what they wanted. And shouting, as one of my old teachers used to say, is never the solution.

