Improving codebases with AI
Can a LLM improve a legacy codebase?
Following on from whether LLMs can help us understand legacy codebases, the next question is whether they can improve them. Let’s find out.
As before our plucky test subject is Estimation Whist. A veteran of 30+ years with 2.9kloc written in Turbo Pascal for Windows. Claude described it thus:
The code is well-structured and documented, suggesting careful planning and professional development practices, which would have extended the timeline but improved maintainability.
Conclusions
There are details of my testing below. But if you’re short of time the spoiler is sort-of. They are definitely not a one-shot deal. With a bit of effort and some tedious manual merging you can improve legacy code. But it’s not fun - yet.
On the other hand, Claude is fantastic at creating custom small tools. And Rust is an excellent language for this. If you’re not doing this already, then I strongly advise you to start using Rust to build tools - even if you’re unfamiliar with Rust.
Claude - improving the code
I asked Claude how to improve the code. It had lots of ideas.
I asked it to produce updated code and it obliged. But it was very difficult to integrate. By default Claude provides an updated file that mixes new code with existing code. And often the existing code is modified like this:
PROCEDURE TMainWindow.StartGame;
BEGIN InitializeAI; // Initialize AI systems
// Original StartGame code follows...
END;
Many times it isn’t clear where to merge the changes in.
In this example, Claude wants to add a call to the UpdateCardMemory function from PlayCard. And this new call should go somewhere between the existing code and the rest of the existing code. That’s not very helpful.
This makes merging the changes somewhat challenging. Worse, even if you get the merge correct there’s no guarantee the changes will actually work. I’ve previously wasted time integrating plausible looking AI generated code that didn’t work. Net, I wasn’t keen on manually merging. There’s got to be a better way.
My first approach was to tell Claude to produce a complete, consolidated updated source file. But it seems producing 115KB of output is too much - Claude consistently failed to give a complete file. OK. So if I’m stuck with partial updates what else can I try? Perhaps Claude could generate an update script to apply the diffs?
Pleasingly that worked. Claude happily produced an update script. But the script didn’t work properly - rather than add code, it removed large chunks of the existing code. I persevered for a while before having to accept this approach is beyond the current capability of Claude.
Onto another idea. Perhaps I could Claude to produce the updates in unified diff format. And then I could apply them with the patch tool in Github.
This worked for the first, simple, change I made - updating the date stamps.
Hurrah!
Unfortunately anything more complicated failed.
It also got confused about the capabilities of Turbo Pascal - one time it decided to introduce TRY/ FINALLY exception handling. That wasn’t supported until Borland moved to Delphi.
So a fail. But out of interest I decided to try manually merging some code…
Claude - manually merging
Merging in diffs from Claude varied from impossible to not-too-bad. Sometimes it wasn’t clear where to merge in the code, but most of the time it provided complete functions to merge in. Which, while tedious, was doable.
The good news was the code mostly worked. The bad - while it talked convincingly about improving the quality of the bidding logic in Estimation Whist, I remain unconvinced it is significantly better. Different, yes. More complex, yes. Better, marginal at best.
What about o1?
I tried o1 as well. But, I was disappointed.
The first problem was size. The 115KB file was too big for o1. After some refactoring I had the main game logic file down to 64KB. I gave it to o1 and it produced a set of diffs (sigh) which it confidently predicted would significantly improve the bidding logic. I merged them in - and the code crashed with a divide by zero.
The second problem was time. I was a fan of o1 when it first appeared. But it is slow. It can take up to 100 seconds to get an answer. And following the recent upgrade of Claude from version 3.5 to, err, 3.5 the o1 answer doesn’t seem any better than Claude. If o1 reliably gave better answers then I’d sacrifice the time. But it doesn’t, so Claude is more appealing and my use of o1 has dropped off.
I quickly abandoned the investigation of o1.
Claude - building tools
As an aside, one thing Claude excelled at was building small helper tools. For some reason one of the versions of the update script output the code with Unix line-endings. Now, I could have used ‘unix2dos’ to convert them. But, instead, I got Claude to create my own versions in Rust. Claude happily produced the code, wrote UTs and 10 minutes later I had my own versions of unix2dos. OK - maybe not the best example - but the point is AI makes it trivial to create custom tools.
And Rust is the best language to use. Why?
Rust has a very strict compiler. This catches many errors that would slip through in other languages. For example, if your code compiles you can be confident there are no buffer overflow, use-after-free, null pointer dereferences, memory leaks or concurrency issues.
Rust has a very nice framework for UTs. It’s trivial to get Claude to generate comprehensive UTs alongside the app (just tell Claude to “include comprehensive UTs” in your prompt).
Cargo is a nice build environment. Install the compiler, then run
cargo new <tool-name>to get a new project. Copy the code Claude produces tomain.rsto thesrcdirectory, runcargo testand, all being well, you have a tool. You might need an updatedcargo.tomlfile (to pull in any dependencies). And you might want to run cargo build --release to get a release version.
The combination makes it easier to trust the code Claude produces. If the Rust compiler is happy and the UTs look sensible and the UTs pass then there’s a very good chance the code is good.
Over the past week I’ve created various Rust tools:
pdf2txtto convert pdf to text files.pdfobfuscateto identify and obfuscate product/ company names in docs.finalframeto grab the final frame from an mp4 video.mp4reverseto reverse an mp4 video.txtinfoto count words in a text file and how many are unique.
If I had time I’d create even more :).





