When my parents were born, computer programming didn’t exist as a career. But by the time I was a teenager, coding was a thriving industry. And for the past 30 years it has given me interesting work and a healthy income.
Yet AI is changing things. Coding is on track to largely disappear as a profession before my parents die. And I am increasingly convinced coding will be obsolete within my lifetime. Controversial? No, I don’t think so. Let me explain.
As a kid I played a lot of cards with my family. My favourite game was “Estimation Whist” – the idea is to predict the number of tricks you’ll take. And then your score is based on how accurate your prediction was. In retrospect, I wonder if my parents were trying to teach me a lesson about the importance of accurate estimates… …but I digress. When I went to university, I couldn’t play with my family so I did what any, erm, normal person would. I wrote a program to simulate my family.
My game worked pretty well, but I could almost always beat it. I discovered converting my fuzzy card-playing thinking into program logic was hard. Very hard. Well, at least for me, it was very hard. Anyway, since then, I’ve wondered whether a neural network would be able to play better. And last week I decided to find out.
My plan had been to use Github Copilot to build my new neural network. Earlier this year I’d used Copilot to build scripts to generate training data. But I’d got stuck writing the Pytorch code for the network. Pytorch, if you’re not familiar with it, is the Python infrastructure for building neural networks. It’s an industry standard, but it’s a complex unwieldy beast. The current version is ~2.5GB in size. Why does it need to be that big? Good question – I’d like to know, although my instinct says it’s probably bloated … ..but I digress.
Github Copilot is remarkable. It can read my mind. Write a comment at the top of a function and Copilot will fill out the rest. Go to add a new line and Copilot will write it before you can. Other times it suggests things I didn’t know I wanted. I love Copilot. But it is a Copilot. It gets things wrong. Sometimes really wrong. It needs to be supervised. You need to know the programming language you are using – at least the basics. You need to know how to code.
And it has limitations. Copilot is poor at refactoring across multiple methods. Want to make a change that affects multiple source files? More times than not Copilot will get it wrong and you’ll be left unpicking the mess. It struggles with large codebases – it can’t “see” enough of the codebase to make useful suggestions. But don’t get me wrong. It’s magic when it works.
However, o1 had been released a few days before. I’d seen a zero shot demo of it creating a decent implementation of Tetris. Could it create everything I needed? There was only one way to find out…
Over the course of the next few days o1 created all the code I needed. By the end of day three I had a neural network that played Estimation Whist. And played well. When my o1 tokens ran out on day two (there’s currently a limit of 50 queries per week for o1-preview), I tried reverting to GPT-4o. But it was too painful. GPT-4o feels like a toy in comparison to o1. The feeling of withdrawal was so strong I found myself signing up for a second o1 account.
The code almost always worked first time. On the rare occasion it didn’t, I gave o1 the error message and said “Fix it”. And it did. It was amazing.
And it wasn’t just mainline logic which was quick to generate. Test code was too. I generated lots of tests on the basis the more tests I had, the more comfortable I’d be that I wasn’t reviewing the code. Then I realized I could get o1 to review its code – it doesn’t care who generated the code and is always impartial. It tweaked a few minor things, but decided the code was good.
But I couldn’t resist. I sneaked a peek at the code. Hurrah for humans I thought when I found a bug. But I was wrong. My ‘fix’ broke the code. I reverted and learnt my lesson. I wasn’t needed.
Then I tried Claude. It was almost as good as o1. It could also write large blocks of code. Wow. Claude doesn’t have the reasoning chain-of-thought capabilities of o1 – what will Claude be like when it gains that capability?
With both models I was generating chunks of code about 500loc in size (yes, I agree, loc is a poor measure, but it's the best we've got.) I had to be careful with encapsulation and interfaces to ensure I didn’t give the model too much at once. And the interface to o1 isn’t, err, great – you can’t (yet) attach files so it’s a case of pasting the code into the prompt. But I could live with that given how awesome it was …
In the final analysis I produced code at least 10-30x faster than I could as a human. And I didn’t need to understand Python. Or Pytorch. Or very much at all. Github Copilot suddenly felt very dated.
Three years ago, if you wanted to code you needed to write every line by hand. You needed to understand the language, the libraries, the interfaces, everything. Github Copilot moved the bar. It could write simple methods for you. It understood the details of the language, so you no longer had to.
But o1 moves the bar again. You don’t need to understand languages. You don’t need to work out what are the best libraries to use. o1 does it all for you. You are dealing in modules, not lines of code. It’s as if you’ve got a team of capable, enthusiastic, super-fast, incredibly knowledgeable early-in-career engineers. Feed them requirements, interface definitions, data models and they’ll implement the code. They don’t get tired, they don’t get cranky, they’re always polite and you can (nearly) infinitely clone them.
The models are only going to get better from here. As Ethan Mollick says, “assume this is the worst model you’ll ever use”. o1 has a context length of 128k tokens. There are roughly 7.5 tokens per loc, so 128k tokens is ~16kloc. The Linux kernel is 139kloc; the largest component in my current work project is 70kloc. War and Peace (~590 thousand words) is ~800k tokens. As of now they don’t fit.
But Gemini Pro has a context window of 2 million tokens. That’s ~260kloc. Now imagine building a model that has the reasoning of o1, the context size of Gemini and the raw underlying model from Claude. Imagine being able to extend & refactor the Linux codebase without having to understand any of the underlying details. Such a model is coming; we have the pieces – they just need to be plugged together.
What does this mean?
Coding is going to die. We just won’t need to do it anymore. It reminds me of the widespread adoption of compilers in the early eighties. Before then folk mostly coded in assembler. Compilers changed that. They made it easier to code. For a while folk checked the compiler outputs, but soon they began to trust. Nowadays we never check the compiler output. The same will happen with LLMs. In the next year we’ll move to writing modules rather than code. We’ll still review and check the output. But we’ll learn to trust and then we won’t review the code anymore.
Some folk think there will be an artisan market for coding. They are wrong. There might be an artisan market for chairs, but there isn’t one for hand-written assembler and there won’t be one for hand-written code.
The UIs will change. VS code has the wrong interface for the new world. Earlier this week I tried out Cursor. I expected to be impressed. But I was disappointed. Sure, it’s better than Github Copilot. But it doesn’t beat o1. And the Composer feature – the thing that would allow you to build at a module level – is hidden away. Cursor has the wrong interface. It’ll be left behind – it’s not enough to be an incremental improvement on earlier tech when the paradigm shifts.
Github Copilot and Cursor represent the first generation of AI automation – “Assistants”. They are good intra-method. They speed up traditional coding. This is where we are today.
The second generation is what I think of as “Composition” – models producing modules of up to 1kloc at a time. The code within the modules is consistent and works. New tools will be built to speed up composition. We’re on the cusp of this world today.
The third generation is “Workflow”. Give a model the requirements for a product and it’ll go and build that. The tools create full solutions. Devin/ Devika etc are headed in this direction. Design, code, test, debug, fix, spec will all be automated. How far away is this? Who knows, but it feels a lot closer than it used to be.
And when code is so cheap to produce many of the rules and beliefs we hold true will break. Prototyping becomes next to free. We can do more prototyping and build better products. Iteration is cheap. Why use open source when you can build exactly what you need? Open source inevitably pulls in unnecessary function and increases your attack surface. If you only build what you need then is the result more secure software?
Do we end up with declarative code? Currently fixing a bug requires carefully modifying existing source files. But if we’ve got awesome tests and fantastic code reviewers do we move to a world where we replace rather than mend? Do we make code more secure, by having code than is continually being rewritten and evolving?
If we’re building resilient software, do we create multiple, different implementations to get high availability? If we’ve got tools that can build solutions cheaply, then the answer is probably yes.
What does this mean for CS degree courses? What does it mean for the current set of graduates? Sadly, I suspect they are spending their time learning obsolete skills. No longer will it matter whether you have experience with Python, C or Algol68. And if you are spending time practicing Leet code then I strongly suggest you stop and go and do something – anything – else instead. Trust me, it’ll turn out to be a better use of your time in the long run.
I didn’t realise it at the time, but I’ve lived through the golden age of hand written coding. That era is coming to a (rapid) end. It’s not going to be the same again.
Great article, though I don’t agree with everything. I suspect open source and some hand written (or at least hand reviewed) code will continue because software is inherently complex and modularity will always be more economic. In other words, AI will allow software projects to get larger, but it will never be economic (or secure) to build all the building blocks from scratch every time. Like new research in biology doesn’t reinvent from scratch all the chemistry it depends on. No doubt AI will soon be able to build using open source pieces too.