Learning from Downton Abbey
A manifesto for agentic software engineering
Recently we’ve been rewatching Downton Abbey; it’s as good today as ten years ago. As ever, Maggie Smith is utterly fantastic; the show is worth watching just for her portrayal of the Dowager Countess.
But it’s also interesting watching the family struggle to come to terms with the new technology of a changing world. They find themselves grappling with telephones, cars, toasters and even the novelty of a hair dryer.
All this technology impacted people. At the start there were dozens of staff - the butler, footmen, valets, lady’s maids, housemaids, kitchen maids, scullery maids. But as new technology appeared it started to obsolete the more junior roles. Vacuum cleaners replaced housemaids. Mixers replaced kitchen maids. Cars replaced the horse and trap.
We watch as the butler, Carson, pines for the old days. But he can’t stop the change; all he can do is choose how quickly to adapt. Downton Abbey is a good reminder that change is not unique to the current day; it’s an inevitable part of life. And software engineering is set to undergo its own Downton moment.
How software gets built
Last week I outlined the disruption that’s coming to the software industry. This week let’s take a look at how this change might play out.
The classic model is (roughly) requirements -> spec -> architecture -> design -> implementation -> test -> deploy. Admittedly it’s never as simple as this; phases often overlap, agile enables more iteration, test should envelop the implementation phase. But it provides a useful way to think about the layers involved in software development.
Pre-AI we increasingly automated the lower levels - deployment and testing have been partially automated for decades now. As could straightforward boilerplate implementation. But AI enables us to extend that existing automation - and automate further up the stack. Current tools enable the low level design, implementation and testing to be largely automated - plus those tools can significantly help with spec, architecture and high level design.
This is not vibe coding
The core idea underpinning vibe coding is: describe what you want in natural language, let an AI generate the code, and hope for the best. If it doesn’t work keep prompting - and hoping.
The results can be amazing. And terrible. It depends on many factors - many of which aren’t clear. Luck plays its part.
But vibe coding leaves a lot of the potential of agentic development on the table. It doesn’t make use of the decades of experience we have in building software, doesn’t follow a proper software development process, doesn’t lean into testing, doesn’t provide any guarantees about quality.
What we need is a process which takes software development best practice, adjusts it for the new agentic tools and then uses that to build software. This is the combination that unlocks the potential of the new tools - and takes us from vibe coding to something new. Agentic Software Engineering, perhaps?
Agentic software engineering
So what does this new way of software engineering look like?
Small teams of ~2 engineers.
Well defined projects with clear interfaces and minimal interaction between teams.
Steered agentic workflows following the traditional design, implement, test workflow.
Heavy use of testing, starting with test driven development.
High levels of code-coverage (~90%+).
Human attention focused where it provides most value: architecture, high-level design, AI oversight, AI correction.
Rapid iteration; multiple prototypes built to test markets.
It adapts the old to the new - and provides a framework for what the software engineering teams of the future look like.
Quality
Code-review is a frequently discussed subject; some orgs insist that all AI generated code is human reviewed. The problem is code review is hard. It takes somewhere between 50-100% of the time it takes to manually write the code. Worse, reviewing is arguably harder than writing. Experience shows it’s hard for humans to review more than 200-400 lines in one sitting.
When you write code, you build your mental model as you code. Understanding is tied to creation. But, when you review, you have to reverse-engineer that mental model from the code. You’re trying to work out the what and why simultaneously - and that’s mentally tough.
But asking about code review is asking the wrong question. Instead, the question is about quality. Code review is one way we ensure quality. And the AI world creates new options for ensuring quality.
It becomes practical to use modern safe languages such as Rust. Codex can write Rust code just as easily as Python.
It’s cheap to build thorough test coverage. Unit tests, functional verification, performance, scale, resilience - AI can build them all. Multiple Codex agents can run through the night building out the coverage. Writing tests is something that can be nicely parallelized.
Tests can auto-heal; AI can not just run test suites, it can fix the bugs it finds.
AI can provide repeated code-reviews; and different models, with different blind spots, can review the code. Manual code-review can be reserved for the places where it provides genuine value: the critical paths, external interfaces, authentication flows.
Taken together these approaches provide a range of new ways to improve the quality of code. Manual code-review was appropriate in the human era of coding; it offered the best return on the time spent. But AI changes that equation; quality can be achieved through different means.
Limitations
It’s not for everything though. Some projects are less about the code and more about co-ordination between multiple teams. Those are not a good fit. Nor are projects where there is no clear definition of success, or those where it is difficult to automate testing.
Personally, I’d still hesitate to use agentic coding for life critical or performance critical projects (although that may change as the tools advance - and humans are by no means perfect on these projects either).
And as to the size of the project? For now it probably works best up to the 500kloc to 1M range - but that’s based on gut feeling from creating several 50-200kloc projects, rather than hard empirical evidence - there just hasn’t been enough time from the models’ release to know for sure.
And so?
Many great houses didn’t survive the 20th century; they were demolished, or sold off, or left to decay. Downton survived because the family adapted.
Some software organizations will be the demolished estates. Those who adapt have a chance at becoming Downton.
And Carson? He didn’t vanish with the housemaids. He stayed because his real value wasn’t in polishing the silver; it was his judgment and knowledge of how things should be done which mattered. And he was part of a house that adapted. The uncomfortable reality is few of us will get to be Carsons; thriving in the new world will require luck as well as ability.


I've started using Claude code for the last few weeks and I've been massively impressed. As someone who is in meetings a lot of the time it has enabled me to go from ~2 small PRs a week to ~2 small PRs a day. That's huge!
Places where I've found it really useful are:
- Adding another case of an existing pattern
- Writing all the tests for me if I've hand written the actual code. I write lots of comments and docstrings saying what the code is meant to be doing, then the actual prompt I write to Claude is simple. It also fixes all my bugs which is amazing. That's super useful because writing tests and fixing bugs often takes way longer than writing the code itself.
I'm still not convinced by giving up on code reviews though. Not because the code written is bad, but because the prompt I've written isn't always clear enough. Then I look at the code and realise, oh that wasn't what I meant. The other thing that reviewing code helps with is it gives me an opportunity to update my mental model of how the code works, to me at least that still has massive value.
I've also never heard of a project where the requirements were clear, you didn't need to talk to other teams/people and the difficult bit was the implementation. I think even more so in the future the difficult thing is going to be defining what you/the client wants and managing the tradeoffs. That has always been the hard bit of software engineering and still seems like a very human problem to me.
I wonder if the trigger for pricing/financial stuff is evidence the models do this better than humans. Sure they make mistakes, but is relative (not absolute) performance what matters here? Consider vending bench - Gemini 3 and Opus are turning decent profits in this admittedly theoretical scenario. But when Gemini can make you hard cash on the market?
The other angle is the world of plenty we're about to get into. I'm on the train and I've got Claude code creating apps, well, just because. They're little tools I'd not have bothered with in the past but it's easy to experiment and see if they fix a problem. In the past we spent ages on reqs because it was so expensive if we got it wrong. But now?