The last invention of Gabriel Wells
How R1 brings AI reasoning home
Last Monday, the 20th January, was a historic day. The news was mostly dominated by an, err, ‘event’ in Washington. But a release from DeepSeek AI, China’s open-weights frontier AI laboratory was arguably just as significant.
DeepSeek released R1 - the first open source reasoning model. Reasoning models differ from earlier LLMs in that they spend more thinking before answering - the fancy name is ‘test-time-compute’.
But what’s test-time-compute? A good analogy is chess. The original LLMs - GPT4o, Claude etc - are playing speed chess. They give you the first answer they think of. But reasoning models carefully analyse each move before making it. This approach turns out to work well - very well - for certain problems: coding, maths, problem solving.
Until Monday, the only reasoning model we had was OpenAI’s o1 (although OpenAI previewed o3 at Christmas and are promising o3-mini will be released in a few weeks). But now we have R1. So how does it do? Impressively well. The evidence is it’s comparable with o1. Which is remarkable - the preview version of o1 appeared just 4 months ago, and the full version was released a month ago.
And this is from a Chinese lab - which (in theory) doesn’t have access to the latest and greatest AI hardware. Even better they released distilled versions of R1 that you can run locally on your own hardware.
What’s a distilled model? They are smaller models trained to mimic the behaviour of a larger more complex models. Rather than train the small model on the original training data, the big model ‘teaches’ the little model - and it turns out this teaching approach results in significantly better small models.
Reasoning models
Over the past few weeks I’ve spent a lot of time using o1. Reasoning models are quite different from traditional LLMs. The big difference is they can take a lot longer to generate an answer. o1 has taken 12 minutes to generate a reply and 7+ minutes is not unusual.
They are not interactive - rather they are “set and come back later”. As a result I find myself working on several things in parallel. A coding program in one session, a second strand of the problem in another, and perhaps a test program in a third. It’s intense. And hard. More often than I’d care to admit I come back to a session and find myself wondering: what was I doing?
Secondly, good prompts are important. The better the prompt, the better the answer. And better means structure and length. A page of text is fairly normal. I like the advice here which suggests structuring the prompt as:
Goal: what you want to achieve.
Return format: how you want the result returned.
Warnings: things for the model to avoid.
Context: as much relevant background information as you can provide.
I’m too impatient to write a page myself, so I use Claude to generate a prompt for o1. For some things this might be an iterative process e.g. say I have a hard technical problem to solve:
Use Claude to create a prompt to get o1 to produce say, ten, ideas on how to tackle the problem.
Use o1 to generate ideas.
Refine the best idea with Claude and turn into a new prompt to generate a solution.
Use o1 to generate the final solution.
This works for more than just technical problems. I experimented using the same approach to generate science fiction stories. And it worked pretty well. I now have over thirty ~10,000 word science fiction stories. And I turned one into an audio book - it’s not the most original story, but I enjoyed it. You can listen to it here: The Last Invention of Gabriel Wells. And remember this took about 15 minutes of my time to create - five to prompt o1, five to use ElevenLabs to convert it into an audiobook and another five to choose a voice I liked. Oh - and $5 in ElevenLabs credits.
So what about R1?
I’ve been experimenting with the distilled versions of R1 over the past few days - they are small enough I can run on my local machine. And they are interesting.
For the first time we get to see the reasoning process; o1 hides this. It’s fascinating.
However, the small model failed to generate a version of Tetris that worked - it forgot to define the constant for the colour red. So I asked it to investigate:
Amusingly it blamed me - ‘the user forgot’ - for the bug. Hmm. But the large publicly available model got it first time:
And it was quick - it took 17 seconds. It’s getting to the stage where it’s quicker to create a new game of Tetris than find, download and install an existing app.
But R1 is also censored. Ask about Taiwan and there is no thinking involved - the answer comes back immediately.
What about Anthropic?
You might well be asking - why don’t Anthropic have a reasoning model? For now Anthropic seem to be struggling with lack of compute - it’s quite common for Claude to switch to concise responses due to heavy usage. And while it’s likely they have a reasoning model under development perhaps this lack of compute is preventing them deploying? As to why they might be short of compute - no one seems to know. It’s unfortunate; a R1 distilled version of Claude would be very interesting…
The pace of progress
It’s quite amazing how fast things are evolving. Four months ago, reasoning models didn't exist. Now we have multiple options - some running locally on consumer hardware. And the number of options will only increase from here. The pace of progress is breath-taking.
The barriers to entry are dropping rapidly. You no longer need specialized knowledge or expensive cloud computing resources. Anyone with a decent GPU/ NPU can now run these models. This democratization will accelerate innovation as more people experiment with and build upon these technologies.
Looking ahead
The release of R1 is another significant milestone in AI's rapid evolution. It’s not just the reasoning that is impressive - it’s also that the performance available from an 8 billion parameter model has significantly improved. An 8 billion parameter model is something you can run locally (the rough rule of thumb is you need 1GB of RAM for each 1 billion parameters). And the evidence is there’s plenty of scope for further improvement. Distilled versions of the open source equivalents of o3 are likely to get even more performance from small models.
We're moving from an era where AI was a remote service to one where it's a personal tool. And that’s going to change the world of software development.
Development cycles will continue to compress - when you can generate a working game in seconds, traditional software development practices become obsolete.
The focus will shift from writing code to defining problems and requirements effectively.
But perhaps most importantly, R1's release shows that the AI landscape isn't going to be dominated by a few large Western companies. Innovation is happening globally, and open source alternatives are emerging rapidly.
The flywheel where the current generation of models train the next generation is well and truly spinning. How fast it accelerates remains to be seen. But it seems almost certain that in a few months’ time R1 (and o1) will seem like toys compared to the newest models.






