Birthday presents
Why I'm no longer sure I want to see what's waiting downstairs
I still remember the excitement of birthdays when I was a child. Getting up, running downstairs and excitedly opening presents. As you get older that excitement fades. But AI has brought some of it back - every morning I hobble down the stairs and excitedly look to see what progress Claude and Codex have made overnight. Sometimes magic has happened - a tricky bug fix, a game I’ve not seen for years running in the emulator, new function suddenly working.
But some days I’m met with blankness - Windows update has decided my box was inactive (no, it wasn’t) and rebooted to install updates. Those days make me sad - development stopped. It’ll take time to get going again.
I find myself wondering how long it’ll be before a more capable model decides it knows better and blocks the update?
Claude Mythos
And models are getting more capable. Claude Mythos was announced last week. Mythos appears to be a step-change, at least in cybersecurity. So much so that it’s not being made generally available.
It has proved highly adept at finding security holes in existing software; Mythos is in a different league from previous models. It found a 27 year old DoS attack in OpenBSD, a Linux privilege escalation, Firefox JIT heap spray attacks that escaped both the renderer and the OS sandboxes. It seems particularly good at linking together smaller exploits to turn them into something more serious.
Such is the volume of exploits that only a subset of them are being shared to avoid overloading maintainers.
Source code helps, but Mythos doesn’t need it; it’s apparently pretty good at reconstructing plausible source code from a binary (in fairness Opus 4.6 is also pretty amazing at this too). Then it can use that to find more exploits.
Mythos has found new exploits in existing critical infrastructure. Windows, Linux, browsers. And that’s software which is already pretty well hardened. You’ve got to think that less hardened / older software is going to be a rich playground.
It’s hard to avoid some uncomfortable conclusions:
We’re likely to see an uptick in exploits starting now. Why now? Because anyone sitting on a zero-day exploit now has a shrinking window in which to use it; there’s a good chance Mythos has found the same exploit and folks are working to fix it.
We now know models can find - and exploit - security exploits incredibly well. Overall exploit success rate jumped to 72.4% compared with 14.4% for Opus 4.6. OpenAI’s Spud model is likely to be just as capable as Mythos; open-source models are likely 3-9 months behind.
The race between the good and bad folks is on. The good folks have a head start; will it be enough? We’re about to find out.
As Anthropic note:
Ultimately, it’s about to become very difficult for the security community. After navigating the transition to the Internet in the early 2000s, we have spent the last twenty years in a relatively stable security equilibrium. New attacks have emerged with new and more sophisticated techniques, but fundamentally, the attacks we see today are of the same shape as the attacks of 2006.
It’s not a surprise
If you’ve any experience of working on legacy software products in large companies none of this will come as a surprise. Legacy software uses obsolete languages, outdated architectures, and is often poorly tested. Worse, commercial companies invariably focus on features; security and quality are almost always overlooked. Things have to get bad before large companies get interested in security.
Take Windows. It is written in C. C is a unsafe memory language. Windows dates from before we properly understood how to build secure software. It has been patched and fixed over the decades since. But it’s a bit like trying to eliminate draughts in a 1930s house - no matter how many gaps you fix there always seems to be somewhere else the wind will get in.
It’s hard to claim Windows is high quality. The constant patching tells its own story. The instability when it runs out of memory. Windows Explorer’s love of restarting. The corruption of the icon cache. Endless bugs which have persisted for years. And these are the visible ones. What about all the others we don’t see?
The focus on features over quality doesn’t help. The history doesn’t help; today we’d use a memory safe language like Rust; we’d architect security in. And then there’s the obsession with backwards compatibility. NTLM - an easily crack-able auth protocol from the 90s - is only being disabled this October. It should have gone decades ago. SMB, print spooler, NetBios, accessibility - all have been attacked because of a desire to retain backwards compatibility.
What do we do?
I’ve written before about the changing calculus of rewriting versus refactor vs do nothing. Mythos feels like another strong nudge towards rewriting.
Over the past few weeks I’ve (==Claude) ported the Opus/Silk codec to Rust. An H.264 port is also in progress. These ports are really simple - agree on the architecture/HLD, then let Claude implement against a perfect oracle - the original codec. Once the function is solid, set Claude off autonomously exploring perf improvements (this is done in a worktree - if the change is promising then we test thoroughly and only then pull into main).
Now I’m not for a second suggesting that reimplementing Windows or Linux is anywhere near as simple as reimplementing a codec. But there’s a path here. And there’s lots of critical software that is closer to a codec than an operating system. Maybe lots of people are busy reimplementing legacy code to remove unsafe memory languages. But I doubt it.
Nor am I under any illusion that the teams who don’t have access to Mythos are using Opus 4.6 to hammer their code looking for exploits. Sadly I expect many will be wasting time trying to measure productivity improvements from Github Copilot. They should be pitting as many copies of Opus 4.6 as they can parallelize against their codebases - starting with the publicly visible bits of code. Got a complex protocol engine written in C sitting on the public internet? You really need to go hammer it. You need to be the finding the exploits first.
Vibe coding isn’t going to help. We’re seeing an explosion of poorly written software - written by non-engineers who don’t understand how to build quality software. Some of this software will inevitably work itself into critical positions - and then be exploitable. I’m already hearing stories of c-suite execs building their own custom tools. Some of these are of, err, questionable quality. How long before one of these custom apps is exploited to provide an easy path to company confidential data?
Understanding how to build secure, high quality software has never been so important.
And so?
I’m beginning to wonder if the "presents" I find in the morning will be ones I want. The risk of coming down to a hacked machine - or worse - broken internet has significantly increased. Mythos gives us a stark warning of what’s coming - in the long run we’ll end up with significantly better software - but the process of getting there is not going to be smooth.


HB indeed.
In times like these when consumers need reassurance that the companies they are buying from take security seriously, brands will become paramount. Particularly when the details will elude all but the most knowledgeable. I wonder if there will be an opening for security validation brands as secondary marketing, much like Dolby became a byword for audio quality on HiFi equipment made by many manufacturers. "Security by Mythos" has a ring to it...