The power of demolition
Can you rebuild a legacy product with Codex in a weekend?
Years ago I had to go to hospital. It was one of those old hospitals - a Victorian building with high ceilings and wide corridors. It was surrounded with newer buildings that had been added over the years. One was from the 1950s. It had lots of windows: bright but cold. Another, from the 1990s, was darker: fewer windows but warmer. The differences in ceiling height meant that the floors didn’t line up between the buildings. Floor 3 in the Victorian part was floor 5 in one of the newer attached bits. I could well believe it had a flooded basement. Or rooms that were never used.
Many large software codebases remind me of that hospital. They have evolved over the years, built with different technologies to different standards. Individually all the decisions made sense at the time, but the end result is, well, a muddle. Maintaining this collection of eclectic components (or buildings) is expensive.
One project I’ve been involved with over the past few years reminds me of that hospital. It is near max on every axis. Huge number of languages? Tick. More microservices than you can shake a stick at? Yup. Lots of open source? Oh, yeah. Lots of unused code? Double yup. Lots of repos? Of course.
Maintaining such a creation is hard. And expensive. All that open-source, all those frameworks, bring a steady flow of CVEs which require constant bumping and upgrading. Doing this across tens of repos feels like wading through treacle. It has been a sobering experience for all involved.
And it’s not just maintenance that’s a problem. Performance suffers too. Over the years more and more mapping layers creep in, gluing together all the different components, transforming the shape of the data without doing any actual work. Then there are microservices. Uncharitably, these convert function calls into encrypted network flows. A rough estimate suggests over 25% of the CPU used is spent on this transformation and encryption - lots of effort and energy spent for little return.
It seems very unfair. Every decision we made along the way was sensible. But, somehow, taken together, they led here. How could that happen?
Yet… We’re not alone. There are many other codebases like this. It seems almost inevitable that most large products end up in this state as they age.
What to do?
The traditional approach is to tread gently. Find the most egregious components and refactor. Try to remove some of the open-source. Try to merge some microservices. Perhaps rewrite a few components. But it’s expensive and the gains are often small. And it comes with the risk of regression. Just like the production code, the test frameworks have often decayed as well - slow, brittle tests are the enemy of rapid development progress.
But AI increasingly introduces another option. My recent experiments with Codex (and GPT-5) have made me question whether we are now able to undertake more ambitious rewrites. Building mdmcp (a Swiss army knife MCP server that provides a sandboxed command runtime complete with a set of plug-ins that give access to Confluence, Slack, Jira and Outlook email/ calendar) has made me increasingly optimistic that it is possible to use AI to significantly accelerate the reimplementation of legacy products.
Over the weekend I set out to find out how far I could get with reimplementing our legacy application. My goal was simple - retain the same external interfaces (fortunately they are all well documented), but replace the innards with something much simpler. I started with some simple design principles:
Single, monolithic executable (no microservices). Simple to build, simple to deploy, simple to run.
Single repo. Avoid the multi-repo pain.
Single language: Rust.
Use industry-standard approaches and packages (the things that will be strongly represented in the training data).
Minimal use of open-source.
Focus on doing one thing well - don’t attempt "future-proofing".
Ultimately my goal can be summed up in a single principle - design for maintenance. Simpler, focused products are inevitably easier to maintain.
My weekend
It was a wild weekend. By the end I had the core of the product reimplemented. I could deploy the backend resources in Azure (it uses SQL, Cosmos DB & Azure Blob Storage). It had the bottom layers of my stack implemented and working. I had over 200 unit-tests. I could create subscribers. I could do basic message flows. I had a test emulator.
Now I’m not claiming to have rebuilt the whole product in a weekend. Far from it. But it is a solid start. I’m probably 20% complete - there’s another 20% to build out the remaining code, 20% for functional testing, 20% for perf/scale testing and 20% for the things I’ve forgotten. But I already have hundreds of passing unit-tests. Increasing confidence that what’s been created is solid.
Of course, I’ve got some advantages. I’ve learnt from my predecessors hard-won experience. I’m leaning on the Rust eco-system (so I’ve not eliminated all open-source). I’m leaning into the standard best-practice approaches which the model is trained on. The benefit of hindsight enables me to design for the problem at hand.
But still. 20% in a weekend? That’s crazy. This is a product that hundreds of people have worked on over tens of years.
It was also a huge amount of fun. Being able to go this fast is a blast. Codex did the coding, Claude Code provided oversight. Without these tools I’d have barely been able to start. But now? We’re in a different world.
Codex is smarter. And it thinks harder - it will easily work on a task for 30 minutes.
Occasionally it’ll think for over an hour. Previous AI tools were like having a team of junior engineers. This is getting closer to having a team of seniors. It’s exciting. And scary.
The code it produces is fine. Sure, there are things I’d do differently. But there are also things I’d do worse. Codex and Claude Code keep teaching me new things. New libraries that I wasn’t aware of. Techniques I hadn’t seen before. And the code works, and passes all the tests. Ultimately if it walks like a duck and quacks like a duck then surely it is a duck?
I reckon I’ll be able to convert ~10 million lines of code down to around 100kloc. Even if I have 10x more bugs per kloc I’ll still be ahead.
And so?
For now this is an active experiment. I’m not done yet. But the early signs are promising. It feels like an opportunity is opening up - those who embrace this new technology will be able to advance much faster than those who don’t.
Earlier this year Dario Amodei (CEO of Anthropic) was ridiculed by some for suggesting that 90% of code would be generated by AI by September this month. But when you can go this fast, you only need ~2 out of every 100 developers to be using this technology to hit 90%. Amodei may only be a few months off.
Maybe we can’t all build code as fast as I did this weekend. After all, there’s more to software engineering than just writing code. So maybe we don’t go 500x faster. Maybe it’s only 10x faster. But even then that changes everything. Conway’s law. Software architecture. Team dynamics. Software engineering is architected around humans writing the code: when that’s not true, what then?
A few months ago I drove past that old hospital. Except that the old building had been demolished and replaced with a shiny new, fit for purpose building. If it works for hospitals could it also work for software?


