Am I delusional about AI making me faster?
Unpacking the study that found developers were 19% slower with AI tools
A recent METR study investigated how much AI coding tools speed up experienced open-source developers. The headline from the study? Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
Yikes. AI makes you slower. Can that really be true?
Am I delusional?
I’ve generated a lot of code using AI in the past year. And I’m confident I’ve been faster than I would have otherwise have been. Or have I been dreaming?
One big difference between my experience and the study is that the study focuses on modifications to large, legacy codebases. I’ve been (mostly) working with smaller, new codebases. That’s important for a couple of reasons.
Firstly modifying code is quite different from creating it. Especially from an AI pov. For creation it’s likely the model will already have much of the required context embedded within its weights - it’ll know about the language, interfaces, tools. It can use standard patterns and algorithms. Thus you can often get good results from a lightweight prompt.
But successfully modifying a codebase requires a lot more context. How is the code architected? What are the project conventions? What’s the historical context - why were things done the way they were? Understanding this context is essential to successfully modifying the code. And it’s often hard/ impossible to reconstruct this just from the code.
Secondly there’s the largeness factor. The codebases in the study averaged 1.1M lines of code. That’s beyond what fits in today’s context windows. Models are inevitably looking at a subset of the code. And that can be quite a small subset.
Take Cursor. When making changes it looks at your current file, any explicitly included files, recent files and some other bits and pieces. Maybe 20 files, at 500 lines per file. So 10kloc - 1% of the codebase.
One of the comments from the report was:
“AI doesn’t pick the right location to make the edits.”
Perhaps that’s not surprising.
So what should you do?
Providing the right context is critical. Recently I’ve been using Claude MCP on a legacy codebase. I started out by getting Claude to help me explore the codebase, build an architecture diagram, understand key flows, key data structures. And then I took this and turned it into a custom briefing prompt that I give the model at the start of each session. This all lives in a Claude project, so I can add relevant design docs, API specs, coding standards - and also reference them from the custom system prompt.
There are two levels here:
Things I want the model to know about all the time. Those go in my custom system prompt. Really important things get repeated (as the context grows, models often lose sight of earlier instructions - repetition seems to minimize this - it’s what done by the labs in the system prompts, but there doesn’t seem to be any truly effective way of stopping models losing focus).
Things I might want the model to be aware of depending on the problem I’m solving. Those go in the Claude project so Claude can access them if it needs them, but they don’t clutter up the context window if they aren’t needed.
If you’re using Cursor you can achieve a similar thing with Cursor rules. But whatever tool you use, planning and building this scaffolding is critical to getting good results. It’s just like when you’re working with a new joiner to the team - you need to brief them and signpost resources and tools before they can start to make useful changes.
It’s unclear how much of this config the developers in the study created; the implication from the study is very little - there’s no mention of custom instructions, system prompts, Cursor rules. Interestingly, the study notes "unused elicitation strategies could improve AI reliability" - but doesn’t explore it. This feels like a missed opportunity to find out what the tools can do if used well.
The developers were also relatively inexperienced with AI tools - notably the most experienced was also the one who made a genuine 20% productivity gain. But this is just a single data point, so can’t be used to draw any meaningful conclusions.
Comparison points
The study is focused on experienced developers. But perhaps a more useful comparison would have been to study how AI helps inexperienced developers ramp-up.
There’s little hard data, but 3-6 months to reach ~50% productivity and 1-2 years to reach full productivity are common rules of thumb. Does AI shorten the ramp-up curve? Does it enable developers to reach full productivity sooner? Or consider contractors. Does access to AI enable contractors to be more productive more quickly?
My expectation - and experience - is AI does help here. And the study hints at possible benefits although without providing any definitive answers.
And then there are the questions the study raises:
Developers were paid by the hour; there was no incentive to push for maximum efficiency. The authors note that "…paying developers per hour overall has minimal effect on their behavior." But is that really true?
Did AI latency play a role? The study looked at this; some users reported browsing Twitter, others claimed they never waited more than 20 seconds. My personal experience is that if you are providing sufficient context then AI latency does become an issue - models can take many minutes to respond (especially if you are using MCPs or command line agents). So distraction is an issue - and I often end up running multiple tasks in parallel.
The study used Cursor with Claude 3.5/ 3.7. But things move fast; we’ve now got Claude 4.0, Gemini 2.5 plus agentic tools. How would those change the results?
And so?
The study is fascinating. Not just for what it finds, but also for the questions it raises. Working productively on legacy codebases can be challenging. AI offers the promise of improving this. At a superficial level the study suggests AI fails to deliver on this promise, but the methodology used failed to take full advantage of the AI tools. And so I’m left wondering what the study actually tells me.
But maybe the study was asking the wrong question. Instead of 'does AI make developers faster?', we should ask: 'how do we make AI make developers faster?' This study suggests the answer isn't in the models themselves, but in how we integrate them into our workflows.

