'Twas the night before AI-mas
December's AI avalanche
What a month. At the start of December things appeared to be slowing down. But anyone hoping the AI labs would spend the month focussed on turkey and trimmings got a harsh disappointment. December has turned out to be the busiest ever month in AI.
OpenAI kicked things off with their 12 days of OpenAI. Google then got into the mix with a swarm of releases. Meta and others joined in too. So much happened it was hard to keep up. The highlights?
OpenAI shipped the real version of o1. It comes in plus ($20 per month) and pro ($200 per month) versions.
OpenAI also finally shipped Sora, their video generator.
But the Sora excitement was short lived. A few days later Sora was eclipsed by Google’s new Veo 2 video generator (sample above).
Google also shipped Gemini 2.0. From early testing it feels like Google have significantly closed the gap to Claude.
There are Project Mariner (the browser agent), Project Jules (the code agent), Project Astra (the general agent).
And there are new image generators, Google’s Deep Research, interactive podcasts, ChatGPT projects, WhatsApp integration…
And then, if that wasn’t enough, OpenAI rounded off last week with their new o3 model. o3 is a seriously impressive - and scary - model. It appears to be the best model so far - and by a substantial margin.
On the Codeforces leaderboard o3 is 175th. Putting that in context, o3 is in the top ~0.2% of human coders. Then there’s the ARC-AGI benchmark (which tests how close AI is to thinking like humans, measuring its progress toward becoming AGI). GPT-3 scored 0%. GPT-4o did a bit better at 5%. But o3 trounces them all with a remarkable 87.5%. Earlier this year it was thought this benchmark would last for at least 5 years. That’s looking increasingly unlikely.
o3 is a vindication of the power of Test-Time-Compute (TTC). TTC is a technique where AI models are allowed to spend additional computational resources when answering a question - essentially doing multiple passes at solving a problem, often comparing different potential answers or approaches, before providing their final response. We first saw it with o1-preview four months ago in September - the progress since then has been astonishing.
For now there are a couple of drawbacks. First, o3 isn’t available to the public yet - we’ll have to wait a few more months. Secondly it’s expensive to run. OpenAI are cagey about costs, but it appears the compute required to achieve the 87.5% on ARC-AGI cost ~$350k. So it’s not cheap. But it seems inevitable o3 will follow the traditional reducing-cost arc that all tech follows.
A few months ago I wrote about the future of software development. In that article I described a workflow model. Give a model a set of requirements and the model does the rest - design, code and test. Since then folks have shipped agent tools (replit, devin, jules and more). These tools get us closer to a workflow model. But o3 may well obsolete those tools overnight - o3 may well be capable of doing all the reasoning and planning required for a workflow model …which is wild.
Using the tools
I’ve been playing with some of the new goodies over Christmas. First task was to discuss a recent meeting with Claude. I wanted to get Claude’s views on what was said in the meeting - and discuss it. I had a video, but no transcript.
Step one: use the new Gemini 2.0 to generate a transcript. The original file was ~700MB in size, which is too big for Gemini. So I cunningly used Clipchamp to reduce the quality down to 480p. Unfortunately while that reduced the quality it also increased the file size to 2GB! So I split the audio from the video and gave Gemini just the audio.
The transcript was impressive. Gemini identified the speakers and produced a timestamped, named output.
Step two: discuss the transcript. I asked Gemini for a summary - and it gave me a clear, balanced, neutral summary of the meeting. Even better it flagged the things that weren’t being said - giving me extra insight that I could easily have missed.
It took five minutes to get some new, critical insights. Being able to discuss important real-world events with AI models is a game changer - this is a new superpower.
I also gave Claude the transcript. Claude had a similar viewpoint to Gemini - and also spotted the things that weren’t being said. To be honest, Claude and Gemini were very similar - I still have a preference for Claude, but the gap feels like it is closing. Gemini 2.0 is impressive.
Research agent
Google’s new research agent is cool. Give it a research area and it’ll go and search the web for information, collate what it finds and write it up. For, err, reasons I asked it about the future of fixed voice telephony…
It’s not perfect - in the example below it suggests the OTT market will grow from 11% in 2018 to 50% in 2018?
But overall this is a useful tool if you need an overview of an unfamiliar area.
Perception
AI is currently at a weird point. The technology has become seriously useful. I use it to help me write. To research, to code. To create music, to learn Spanish. To discuss my career. To figure out holiday plans. If Claude disappeared tomorrow I’d really miss it - my life would be measurably worse. I’ve got a digger; I don’t want to go back to using a spade.
Yet the public seem to hate AI. They don’t want it. Public attention in the US is just 0.25%. And that matches my experience. I know many people who’d benefit from AI but just don’t want to engage with it. But for those of us who do engage it gives us an edge. A significant edge.
The pace is not slowing. Next year promises to be another wild year. Whether you're coding, writing, or running a company, AI is no longer optional - it's a competitive necessity. Those who embrace it gain a significant edge; those who don't risk being left behind…


