Code's final chapter

Why o3-mini marks the end of hand-written software

Feb 04, 2025

Last Friday I wrote about the AI tsunami. About the deluge of new models and the challenges of keeping up with the jagged frontier.

And, right on cue, just as the post went live, OpenAI released o3-mini, their new reasoning model. A release brought forward as a direct response to DeepSeek. A way for OpenAI to demonstrate the US retains a commanding lead in AI.

But we didn’t get just one model. We got four. With another two still to come over the next couple of months. So what did we get?

o3-mini-low
o3-mini-(medium)
o3-mini-high
Deep Research

Free users get access to a limited amount of o3-mini-low. Plus & Pro subscribers get access to varying amounts of o3-mini, o3-mini-high and Deep Research. Confused?

I’ve spent the past few days coding with o3. I’ll be honest. I’m very impressed. And also a little terrified.

o3-mini

o3-mini is the lightweight version of the as-yet-unreleased o3. It trades speed for quality. It’s a reasoning model - i.e. it spends time thinking about the answer before replying. The low, medium and high variants determine how much time it spends thinking. Low is quick. High is slower and more thoughtful.

The evidence so far is that it’s a coding powerhouse. Here are the results for Codeforces. Recall that the unreleased full o3 scores 2727 (so within the top ~0.2% of human coders). o3-mini-high is 2,355 on the leaderboard. But that’s still good enough to put it in the top 2% of human coders.

It’s also good at math. Very good. Take the Frontier Math benchmark. This is intended to be very hard - demanding “theoretical understanding, creative insight, and specialized expertise, often requiring multiple hours of effort from expert mathematicians to solve.” It’s worth looking at some example problems. I’ll be honest - they are beyond me.

Full o3 scored 25%. o3-mini-high when given access to Python tools solves 32% of the problems. Wow.

And so?

The previous model, o1, was pretty good at coding. It was within the top 3% of coders on Codeforces. But it was slow. It could take five or more minutes to answer. And that wasn’t a great experience, especially when you need to iterate on a problem. It meant a clunky workflow - I’d use Claude to write a prompt to give to o1 and then hope I didn’t have to go through too many five minute cycles.

But all variants of o3-mini are blisteringly fast. Here’s o3 writing Breakout in Python. And I’ve seen it go faster - where the code just scrolls continuously at high speed up the screen. It’s astonishing.

And it worked. o3 is very good at one-shotting (i.e. getting the correct answer the first time).

The future, today

I spent yesterday working on a new tool in C# to process emails. I’ve never used C# before, never worked with COM, never used MAPI. But within a day I had 800 lines of working C#. The task was done. It helped that I had worked out the requirements for the tool - and could clearly articulate those. But, nonetheless, this was a two week plus task done in under a day.

But, to be honest, most of the time was spent testing - processing 10,000 emails is slow. And Outlook’s COM interface is, well, unreliable. So I used my test time to build some other tools:

I updated my kloc counter program to add support for C#, Pascal & Ruby because, well, why not?
I created a tool that could merge .docx files into a large text file.
I built a throwaway card game.
I worked on an alternate implementation using C++ and MAPI.
I worked on part 2 of the tool to filter emails.

It was addictive. I just wanted to play. And I really enjoyed it.

What does it mean?

I’ve spent years of my life coding, but I’m increasingly of the view I’ll never code again. And I’m OK with that - I’ve come to realise that coding was a means to an end. I like building things. And if AI tools help me build faster, then I think I’m OK with that. Life is short - why dig a hole with a spade when you can use an excavator?

A couple of months ago I wrote about how I used Claude to build tools. Claude is impressive, but the process required iteration. Claude isn’t great at one-shotting. And the limited context window size meant I regularly ran out of tokens. Those problems are gone. o3-mini is great at one-shotting. And it provides gazillions of tokens - I’ve not run out yet.

o3 moves us firmly into the era of code composition. Give the AI the requirements and let it do the rest. For now I still have to drive the compilers and code management. But that will come too - Operator has given us a glimpse of how that might unfold.

The future

The shift from hand-coding to Claude was like upgrading from crawling to a bicycle. But, o3-mini feels like a sports car. The speed is addictive. But it's not just about velocity - it's about what becomes possible when building becomes cheap. Tools that would have taken weeks can be built in hours. Ideas can be tested and refined in minutes rather than days. I had a long backlog of things I’d like to build “one day”. That list is starting to look increasingly empty.

We're entering an era where the limiting factor isn't our ability to code, but our ability to imagine what we want to build. If you doubted that AI would transform software development then o3-mini is your Sputnik moment. We are at the brink of a massive transformation of the software industry. There will be new winners. And existing incumbents will fail. We’ve seen it before - the list of software companies that failed to adapt over the years is long. And we’ll see it again.

The question becomes how do you adapt? How does your organization adapt? How do we as a society adapt? As I said upfront, I find myself both excited and terrified.

But for those of us who love building things, it's an exciting time. The excavator has arrived - and it's time to think bigger about what we can create. And about when and how you learn to operate it!

Martin Davidson

Discussion about this post

Ready for more?