If I were a rich man...
How to turn $5.6M into minus $1tn
If I had $5.6 million I could buy a 3 bedroom flat in Mayfair in London.
Or perhaps some other things (this comes from DeepSeek - the broken formatting isn’t something I’ve seen for a while)…
Or perhaps I might spend it training my own AI model, release it to the world and wipe $1tn (including $600 million from Nvidia) off the stock market. Which is what DeepSeek have done.
But the widely bandied $5.6 million training cost for DeepSeek R1 isn’t entirely accurate. It only represents the compute cost for the final training run of DeepSeek V3 (a ‘standard’ model, which doesn’t have test-time-compute) - which was released on Christmas Day. There are other costs - data annotation, staff, failed training runs. And the cost of training R1 from V3. So it’s more than $5.6 million. But likely not much beyond $100 million.
As background, the $5.6 million number comes from the DeepSeek V3 paper, which says:
Assume H800 rental at $2 per hour and you get $5.6 million.
The fallout
Aside from rattling investors there is plenty of other fallout:
The US labs are rattled. Meta’s AI division is reported to be panicking. DeepSeek comprehensively beats Llama 4 (currently in training); current work is apparently being thrown away and they (like others) are rapidly pivoting to copying the DeepSeek approach.
It’s possible the US sanctions preventing export of the best GPUs to China have backfired; instead DeepSeek have found ways to do more with the limited resources they have.
OpenAI are likely to rush out o3-mini in the next few weeks. o3-mini isn’t quite as good as o1, but it’s a lot faster - and cheaper.
In the short term it likely strengthens OpenAIs position for their Stargate project. US fears of losing the race for AGI will result in even more money being poured into the project.
It raises the question - if a small lab in China is willing to release this model publicly then do they have better models they are not sharing?
Yet - is it all that bad?
DeepSeek have found a way to improve the efficiency of AI. A way to get more from existing hardware. And it appears to be a scalable approach. Provide more hardware, and the models will get better. It’s the bitter lesson writ large. The US labs, with their better hardware could - should - do well from this?
The other, critical, aspect is that the R1 training loop doesn’t involve humans. Humans aren’t checking the data, or providing any feedback. The model is doing that all itself. It works out the test cases to train itself on and then trains itself. This emergent behaviour is both amazing and scary. And it is scalable. Better hardware means faster training, which means better models, which means faster training, which means… The flywheel spins faster.
Finally it shows AI progress isn’t confined to the big players. A smart upstart can still - just - beat the established big players. Maybe there is hope for Europe - for now it appears to be a two horse race between China and the US.
On the other hand, maybe not. China has invested heavily in AI since AlphaGo beat Lee Sedol in March 2016. That match marked a turning point in China's approach to AI. The defeat of a world champion in Go (Weiqi), a game deeply rooted in East Asian culture and long thought to require uniquely human intuition, shocked the ~280 million Chinese viewers watching the match. Worse, he was beaten by a Western company (DeepMind/Google). Twelve months later China’s State Council released their “New Generation Artificial Intelligence Development Plan" which set the goal of making China a world leader in AI by 2030. And it was accompanied by significant investment.
Europe hasn’t had such a moment - and arguably the EU AI Act makes it even harder for Europe to compete. Political inertia means it may now be too late to catch up.
As an aside, it transpires DeepSeek isn’t keen to talk about China’s AI program, censoring its part written reply…
In other news
OpenAI released their “Operator” feature. It’s much like normal ChatGPT, except that the model also has access to a VM with a browser. So it can search the web and interact with websites. The idea is that it can do research for you and then, perhaps, buy tickets, place orders etc.
But I thought I’d see if it could play Doom…
After about 5 minutes it found an online version of Doom - and we were off!
Or maybe not. It’s not really an action packed experience… Even if Operator could move the player, I doubt it would survive long - by the time it had worked out where the enemy was the game would be over!
Local R1
Last Friday I discussed running DeepSeek R1 locally. And failing to create a game of Tetris. But I’ve solved that. Using the 32 billion Qwen distillation I now created - locally - a one-shot (i.e. created with a single prompt) working version of Tetris!
On Friday the website took 17 seconds. My PC is slower - it took 189 seconds. So about 10x slower. But still… …this is amazing.
And that’s just today. Where will we be in six months? Or a few years’ time?
DeepSeek's breakthrough shows how quickly things change. While many are focused on the $5.6 million headline, we've quietly gained local reasoning models, new training paradigms - and Operator. The pace doesn’t show any signs of slowing - if anything, it's accelerating.
What will next week bring? 😊





