Spot the difference

Rebuilding Windows 3.1 with Claude and Codex

Feb 17, 2026

I grew up in the pre-iPad, pre-iPhone, pre-internet era. Back in those days there were fewer forms of entertainment, but one that I did quite enjoy back then was a spot-the-difference puzzle book. Inspired by those days I present to you three images so you too can play spot the difference!

Yes, it’s classic Minesweeper. But why three versions? And why has one got a different window border?

The one on the left is Minesweeper running under Windows 3.1. The one in the middle is Minesweeper running in Wine (Wine translates the older Win16 API calls used by Windows 3.1 applications to more modern Windows APIs). And the third? Well that’s the 386/Dos/Win3.1 emulator Claude, Codex & I have been building.

Windows 3.1 is nearly 34 years old. But it remains an incredible engineering feat. 600 APIs. The first really popular GUI OS. A team of hundreds for 7 years. 3 million lines of code. The idea of reimplementing it is ridiculous.

But here we are. I am, frankly, in awe of Opus 4.6.

Architecture

I’m biased, but the architecture is neat. Win16 apps run under the 386 emulator - the application’s own 16-bit code executes on the emulated CPU, just like they do on the original hardware. There’s a loader which recognizes the "NE" format of Win3.1 executables. One of the key jobs of the loader is to handle relocations - the part where the loader connects the application to the Windows API.

In native Windows 3.1 the loader would patch up the executable with the addresses of the real Windows API functions. Want to call "CreateWindow"? The loader writes the actual address of CreateWindow into the jump table in the executable. 16-bit application code calls 16-bit Windows code.

But the emulated version of Windows 3.1 is written in 64-bit Rust. So we need to get from 16-bit land to 64-bit. And the loader is critical - it sets a magic jump address which encodes both the system module - KERNEL, USER, GDI, SHELL etc - and the id (or "ordinal") of the function.

Then when the Win16 app makes a system call, the emulator notices the magic jump address and drops straight into a 64-bit Rust implementation of the original Windows function. What was a 16-bit far call from 1991 lands directly in modern Rust.

Going the other direction is harder. When Windows needs to call back into the application - say, delivering a WM_CREATE message to the app’s window procedure - Rust can’t call the 16-bit code directly. Instead it sets up a trampoline: push the message parameters onto the emulated stack, push a special return address as a marker, point the CPU at the app’s window procedure, and hand control back to the emulator. The 16-bit code runs normally until it returns, hits the marker address, and the emulator picks up the result and carries on.

So far it’s 60k of production code and 200kloc of tests.

Now we’re far from being able to run every Windows 3.1 app. And there are some graphical rendering bugs - the client area origin has an offset and the font isn’t quite right. But the direction of travel is clear. And being able to play Minesweeper…

The limits

Building things like this is at the limit of what Claude and Codex can currently achieve. This is very much not one shot-able. Or Github Copilot-able. It requires multiple Claude and Codex instances running for many hundreds of hours. It requires good oracles. It requires no human written code. No human reviewed code.

That last part is controversial. But true.

The first cut of the emulator was written in early December using a Ralph Loop on a train. But Ralph loops feel quite limiting now - single agent, single repo. For a while I experimented with my own orchestrator. But now I'm using Claude's built-in agent-team support. This works incredibly well - it’s easy to get 3+ agents working for 5-6 hours. The workflow has changed more in six months than in the previous 30 years.

Others are noticing the changes too - and offering frameworks to reason about the different levels within. Geoff Huntley has written about the six (or seven?) stages of AI adoption by software engineers. And Steve Yegge has 8 stages of agentic workflow evolution. Their scales are primarily about building fast; in both cases large-scale orchestration of the Claude agent team is required to build things at maximum speed.

And so?

It’s not hard to spot the difference AI brings. Thirty years ago, writing Windows took a team of hundreds and seven years of work. Now it’s turning into a side-project over a couple of months. The three screenshots at the top of this article look nearly identical. The process that created them couldn’t be more different.

Martin Davidson

Discussion about this post

Ready for more?