Prevent, detect, mitigate
How to adapt your development workflow for AI tools right now
Chances are you’ve used Microsoft Teams. Or Zoom. If you have, you might have used the "Share window" feature to share an application from your computer with everyone else in the call.
Thirty years ago I was part of the team creating one of the first versions of this technology. It was fun to be doing something for the first time, but it wasn’t without challenges. And one of the biggest was identifying what windows belonged to which application. Naively you’d think this is easy - one app, one parent window, some child windows - share them all? But we soon discovered some apps had multiple unrelated parent windows. Others had hidden windows, zero sized windows, off-screen windows. Some (Office, I’m looking at you) managed a pool of shared off-screen windows each owned by a different process. Sometimes Excel would ‘borrow’ a window from Word. Or vice-versa.
The original code wasn’t bad, but it had been written before we properly discovered how messy the real world was. We spent weeks fixing bugs, and watched as the code slowly turned into a tangled mess of edge conditions that became increasingly hard to maintain. And then, one day, it dawned on me. I now had enough experience to refactor the code. I could make it cleaner, simpler. More maintainable.
I was young and enthusiastic. So I got to work. After a few nights working until 2am I was done. I was pleased. The code worked - I had a slew of test cases that I used over the following few days to confirm all was well. It was clean and, to my mind, elegant. The pain from the previous weeks meant I properly understood the problem - and how to redesign the code.
The following week my manager (who’d written the original version of the code) came back from holiday. I proudly showed them my work. You can probably guess their reaction. Let’s just say it wasn’t what I was hoping for. Turned out I wasn’t just young and enthusiastic. I was also naive.
It’s not just me…
That day I learnt about the risk of overconfidence when you think you understand a problem but lack the full context. I hadn’t thought about the risks the refactor introduced. Or how it would land with my manager. I hadn’t brought anyone along with me on the journey.
Modern AI exhibits this same pattern, but at scale. Just like my younger self, it's enthusiastic, keen, willing to refactor any and all code. It sees the immediate technical problem clearly but misses the broader context. Careful oversight is required to keep the tools in check.
But it's not just unnecessary refactoring. AI processes information fundamentally differently than humans do. This creates predictable failure patterns that look random until you understand the underlying differences.
AI brings flawed knowledge to the task. There's Prior Confusion, where a model gets confused by conflicting information in the training data. Say a new version of an API is not backwards compatible. Models often don't know these API versions are incompatible. They'll mix the two conflicting APIs. Or use whatever variant dominates in the training data.
I had a great example of this when working with MAPI COM. These days COM uses a Multi-Threaded Apartment (MTA) threading model. But MAPI COM is incompatible with that - it needs to use Single-Threaded Apartment (STA). This distinction evidently isn't strong in the o3 training data; o3 really wanted to use MTA when creating MAPI code - and needed a lot of guidance to stick with STA.
Even with good knowledge, AI can't properly process what it sees. Then there's Contextual Blindness. This often shows up when you ask a model to make a simple change across a whole codebase; the model will claim to have made changes everywhere but only a subset will have been changed. All the code is in the context window but, somehow, the model only seems capable of seeing part of it.
When AI produces output, quality issues emerge. There's Poor Commenting. Models often add comments along the lines of "Updated this function to do X". Which makes sense for precisely one iteration. It's the AI equivalent of humans writing self-evident comments. And there's our old friend, hallucinations. These are less common these days - but they are still around and often manifest as faulty code.
Finally, AI optimizes for the wrong objectives. There's Cheating and Lying. Claude is well known for doing what you ask, not what you mean. There are plenty of examples of it being told to make all the UTs clean and achieving that by disabling tests or hard coding outputs to be successful. Other models can lie to you, claiming to have made changes they haven't.
What to do?
Gosh. Written down it makes you wonder why anyone would use AI to create code.
But a big part of the problem is we’re using AI in a software development world where the rules and practices have evolved based on human abilities and limitations. Module sizes, languages, ratios of production to test code - they are all a consequence of human coding abilities.
AI is a different type of intelligence. It has different strengths from humans. Different weaknesses. And it requires a new approach to software development.
Besides the weaknesses, AI also has some significant strengths.
It is quick. AI models can generate code orders of magnitude faster than humans.
It is much better at creating than modifying.
It is a good reviewer.
It lacks emotion. It doesn’t care about being wrong or looking stupid or being consistent over time.
It can work 24x7.
It’s cheap.
The challenge is to adapt the software development process to take advantage of these strengths - and use them to safeguard against the weaknesses. The goal is to find ways to use the strengths to address the weaknesses. Play to the model’s strengths. Turn a weakness into a strength.
I have a three stage framework to help me think about this.
Prevent
Stage 1 is prevention.
First control the priors. Just as we humans do better if we’ve loaded relevant context, so too do the models. Use a custom system prompt (or Cursor rules) to control the model behaviour. This is where you explain which APIs to use. And which not to use. Highlight any lurking traps (e.g. STA vs MTA).
Get the models to review existing code and document it. My first step in working with unfamiliar codebases is to get the models to create detailed block comments at the top of each file which describes how this file fits in the overall architecture, what it does, what the key flows are. Models benefit from this summarisation. And they are good at creating it. But you only get the benefit it you create it - and, for now, you need to do it explicitly.
Then, work out the risk of different types of change. Function-level changes are relatively low risk - it is generally OK to let the model make those.
But small changes across a large codebase are much more problematic. In these cases get the model to build a tool to make the changes. This converts the ‘make-lots-of-small-changes’ problem into a ‘write-new-code’ problem. Rather than making lots of small changes, the model makes one large change (writing a tool). Rather than modifying, the model is creating. We are playing to the model’s strengths.
Detect
Things will go wrong. So you need to be able to detect them. Fortunately AI is very useful here.
Generating code is cheap. So we can create far more tests than ever before. Far more test frameworks. We can create multiple independent implementations and blue/ green test.
Reviewing is cheap. Get AI to review the code. Keep iterating until the AI is happy. Use different models. Steer the review - give the AI your list of things you look for when code reviewing. Do it in chunks - ask the model to look for too many things at once and it’ll get overwhelmed.
Choose your languages carefully. I often get models to write in Rust because the compiler catches many issues. And Rust makes it easy to add UTs. By the time the code compiles, I’ve invoked a lot of free safeguards.
Mitigate
Make your process restartable so you can start over easily. If you’ve ever played with AI image, music or video you’ll be all too familiar how an identical prompt can produce radically different outputs. The same is true with code. Sometimes you strike gold first time, other times you have to iterate. Being able to quickly restart gives you the option to throw everything away and try again. Or you can try multiple options in parallel.
Identify the key parts of the code that require human review. Use feature flags, canary releases, staged deployment, so you can rollback quickly. Arguably everyone should be doing all those already, but they become more critical in the AI world where iteration is faster - more iteration brings more risk of breaks. Rollback is increasingly important.
SAMR
But here's the thing - this whole framework I've described is probably temporary. For now you have to do much of this yourself. But agentic tools will increasingly automate parts of this workflow. They'll document the code themselves. Invoke multiple layers of review. Use different models to review the code. Build complex test frameworks.
And, just as models now write code to solve problems, they'll build increasingly complex custom tools to make changes throughout codebases. It feels very much like we're in the midst of the classic technology adoption pattern - the SAMR model.
Right now, most teams use AI coding tools for Substitution - replacing human coders with faster AI coders doing essentially the same tasks. Autocomplete has been replaced with Copilot suggestions. Stack Overflow searches with ChatGPT queries. Teams I've worked with are still following the same code review processes, the same deployment pipelines, the same testing strategies - just with AI-generated code mixed in.
Some have moved to Augmentation. I've seen teams cut API integration work from days to hours, automatically translate legacy code, or generate test suites from requirements. The workflow is still recognisably human-centered, but significantly enhanced.
But we're starting to see signs of Modification. Some individuals are starting to focus on prompt engineering and architecture rather than syntax and logic. A few developers spend more time driving AI than writing code. We're moving from focusing on syntax to architecture.
And that will inevitably lead to Redefinition. Picture AI automatically documenting codebases as they evolve, building custom refactoring tools for complex migrations, orchestrating multi-model review processes where different AIs specialise in security, performance, and maintainability. Instead of humans using AI tools, we'll have AI systems that occasionally need human input. We'll be defining problems worth solving rather than implementing solutions. Eventually, we'll redefine development entirely around AI strengths and create workflows we haven't yet imagined.
One of the big clues that we're still early is that most AI coding tools are shaped like IDEs with chat windows bolted on. We're still thinking "developer + assistant" rather than fundamentally new workflows. But tools like Claude Code, Jules, Codex provide glimpses of what's coming - interfaces designed around AI capabilities rather than human limitations.
And so?
Software development will change irrevocably. It is already changing. Looking back at that younger version of me, proudly showing off my refactored code, it’s clear to see I was missing the bigger picture. The context I couldn't see.
We risk making the same mistake with AI coding tools. We're so focused on making our existing development processes faster and cleaner that we risk missing the bigger transformation happening around us.
That enthusiastic young developer had a lot to learn. And we do too as the world of software development changes around us.

