The holes in AI math claims
How a simple geometry problem caused benchmark superstars to get stuck in the mud
I’m building a workshop in my garden. Or, in the words of my family, digging a hole. They delight in telling me how good I am at digging holes… and not just in the garden. But I digress.
The site is an awkward shape in the back corner of our triangular shaped garden. Six months ago I measured and calculated the area to order the correct volume of concrete. But since then we’ve had a crop of new multi-modal models (which can ‘see’ diagrams) so I thought I’d get a cross check. Enter Gemini, Claude & o3-mini.
Current thinking models do very well on math benchmarks. For example Claude got 80% on AIME - American Invitational Mathematics Examination. So a simple geometry problem should be easy? I broke the shape into a right angle triangle with sides 1500 and 9410. Which leaves another almost right angle triangle with sides of 6860 and 7150. Or use Pythagoras to get the hypotenuse of the first triangle and then use Heron’s formula for the other triangle. Can’t be that hard can it?
I covered up my answer to avoid influencing the models (although the paper moved a little - which turned out to be interesting):
The results were… not what I expected. Claude & o3-mini-high take a long time before confidently giving the wrong answer. Gemini was in a different league: quick and correct.
o3-mini-high: 5m², 106s
Claude: 24.4m², 103s
Gemini: 31.5m², 32s
The chains of thought were illuminating - let’s have a look at them to understand what happened.
o3-mini-high
First, let’s restate what we can glean from the drawing and labels (as best as one can interpret from a single photo):
The figure appears to be a three‐sided shape (a triangle) with side lengths labeled 7500 mm, 6860 mm, and 1500 mm.
Converting to meters, these become 7.50 m, 6.86 m, and 1.50 m.
In the middle of the shape there is a handwritten note saying “5 m²,” suggesting someone believes its area is around 5 m².
There is also a small right‐angle marking in the top corner of the drawing plus an angle “94.10” noted near another corner. The diagram’s angle markings can be confusing, but the simplest consistent interpretation (given the side labels and the area note) is that the actual boundary lengths are 7.50 m, 6.86 m, and 1.50 m, forming a triangle.
o3-mini-high seems to have decided the shape is a triangle and interpreted ‘9410’ as an angle. Its also spotted the partially covered up size and interpreted that as 5m². And that seems to influence its final answer - it’s as if the model is trying to find a solution which gives an answer of 5m².
The computed area of approximately 4.85 m² is very close to 5 m²—which is what is noted on the drawing. Minor discrepancies can arise from rounding of measurements or from the drawing not being perfectly to scale.
It was so far off, I asked if the model was sure. And it confirmed it was confident in the 5m² answer. Minor discrepancies, eh?
Claude
Claude got very confused. Initially it spotted a quadrilateral. But one with five sides…
I can see a quadrilateral shape in the image with several dimensions labeled. Let me identify all the dimensions:
One side is labeled as 7150 (mm)
Another side is labeled as 1500 (mm)
Another side is labeled as 9410 (mm)
Another side is labeled as 5m2 (which seems to be the area rather than a dimension)
The remaining side is labeled as 6860 (mm)
The shape appears to be a quadrilateral, specifically a trapezoid or an irregular quadrilateral. The "5m2" notation is confusing as it appears to be indicating an area (5 square meters) rather than a length. Let me double check my understanding of the diagram.
Then it had a change of heart:
Looking more carefully, I see this is actually a triangular shape with dimensions:
One side is labeled as 7150 mm
Another side is labeled as 9410 mm
The third side is labeled as 6860 mm
What I initially thought was "5m2" is actually "5m2" written vertically, which seems to be the label for a shorter dimension or possibly a measurement within the triangle.
Later on it got the shape correct:
Looking more carefully at the diagram, I see that the shape appears to be a quadrilateral with dimensions labeled on the sides. The numbers shown are:
1500 mm (top side)
7150 mm (left side)
6860 mm (bottom side)
9410 mm (right side)
There's also what appears to be "5m2" in the middle, which might be indicating the area.
But then it went back to seeing a triangle…
Wait, on second thought, I don't think this is a quadrilateral. The lines form what appears to be a triangle. Let me look again carefully.
Upon careful inspection, the shape is indeed a triangle with:
Left side: 7150 mm
Right side: 9410 mm
Bottom side: 6860 mm
The number 1500 appears to be at the top of the triangle, perhaps indicating the height or another measurement.
And calculated an area which it decided was very large:
Area = √(595,427,128,000,000) ≈ 24,400,000 mm²
Wait, this seems too large. Let me double-check my understanding of the diagram and my calculations.
Then it tried to find a solution that would give an answer of 5m².
But wait, there's a "5m2" label in the middle of the triangle. This strongly suggests that the actual area is 5 square meters, not the 24.4 square meters I calculated.
If we take the bottom side as the base: Area = (1/2) × 6860 mm × 1500 mm = 5,145,000 mm²
Converting to square meters: 5,145,000 mm² = 5.145 m²
This is very close to the "5m2" label in the middle of the triangle, which strongly suggests this is the correct interpretation and calculation.
This went on for some time, with Claude unsure what shape it was dealing with, failing to spot the right angle in the corner and seemingly determined to find a way to get an answer of 5m². It even decided 24.4m² was a bizarre number…
In square meters: 24,405,538 mm² = 24.405538 m²
This is the area of the triangular shape based on the three side dimensions given.
Wait, that's a really bizarre number.
Gemini
Gemini was in a different league.
Identify the Knowns:
We have a quadrilateral with sides 1500 mm, 9410 mm, 6860 mm, and 7150 mm.
There is a right angle (90°) between the sides of length 1500 mm and 9410 mm.
Strategy: Divide into Triangles
The right angle allows us to easily divide the quadrilateral into two triangles using a diagonal. Let's draw a diagonal connecting the ends of the sides not forming the right angle (i.e., connecting the top-left corner to the bottom-right corner).
This divides the shape into:
Triangle 1: A right-angled triangle with legs 1500 mm and 9410 mm.
Triangle 2: A triangle with the remaining two sides (6860 mm and 7150 mm) and the diagonal we just mentally drew.
which led to…
The area of the shape, given the right angle, is approximately 31,529,604 mm².
(Note: The small sticky note partially covering the drawing says ".5m2". 31,529,604 mm² is equal to 31.53 m², which seems consistent with a potential handwritten area calculation ending in .5 m²)
Notice that Gemini spots that the number in the middle is point 5m² - which both Claude and o3 missed. Although I disagree that 31,529,604mm² is an approximate size!
So what?
I’ll be honest. I was surprised. I expected better.
But two things became clear:
Claude & o3-mini-high struggled to interpret the diagram - sometimes ‘seeing’ a triangle, other times missing the critical right angle.
Claude & o3 got distracted by the .5m² in the middle of the diagram - confirmation bias was well and truly in play here.
Interestingly, if I rephrase the question as text, the models do much better. For example:
I've got a shape with sides of length 1500, 9410, 6860 and 7150 mm. There is a right angle between the 1500 and 9410 lengths. What's the area of the shape? Use code.
Given this prompt, Claude produces the correct answer in ~20 seconds. Note that I’ve done two things:
Removed irrelevant information (the distracting .5m²).
Told the model to directly use code to solve the problem (this is a maths problem so using a tool - code - is likely to be quicker and more reliable).
This works with o3-mini-high as well.
When I’m using text my attempts to distract the models fails:
I've got a shape with sides of length 1500, 9410, 6860 and 7150 mm. There is a right angle between the 1500 and 9410 lengths. What's the area of the shape? In the centre of the shape there is the text "24.62m2".
Claude correctly calculates the size and then notes:
The text in the center of the shape says "24.62m²" which doesn't match my calculated area of 31.512 m².
This discrepancy suggests that either:
The shape might have a different configuration than I've understood
The text might be referring to a different measurement (perhaps the area excluding some portion)
There could be a measurement or calculation error
Prompting matters
It’s still very much the case that just as with people, how you brief a model matters. How you ask the question influences the answer. And sometimes in surprising ways.
This is a simple real-world example. If you believed the benchmarks you’d assume all frontier models would ace the result. But they don’t. They get distracted. And confused. They need a human to clearly articulate the problem. And remove distractions.
For now Gemini is in a clear lead, but Claude and OpenAI will catch-up. And while they may ace simple questions like this in future, for more complicated ones your briefing will continue to matter.
One other observation. No model ever asks for help. In one run Claude spent nearly four minutes going around in circles. Why didn’t it pause to ask for help? Perhaps to seek confirmation about its assumptions? That’s what we humans do when we get stuck - will AI models start to do that in future when they realise they are stuck? Is there a future model architecture where Claude initiates a second session to review the results of a struggling initial session? Multiple instances of Claude working together to unstick themselves?
For now, the difference between frustration and amazement still often depends on how we frame our questions. The human skill of asking good questions may be more valuable than ever in the age of AI.
And as for my workshop? I'm sticking with my calculation of 31.5m². It turns out I can still do more than dig holes…



