The continuing story of travels with my AI
Meeting Claudine
Over the past week we’ve continued to explore Canada - with help from Claude, o3 and ChatGPT. They’ve explained Canadian history, suggested walking tours, advised on restaurants, offered driving directions and more.
And a few themes have become apparent.
First, they are hard task masters. Ask them for a schedule and they’ll keep us busy all day long - and more. Breaks are few and far between. They are overly optimistic at how long it will take to travel between places. And overly optimistic at the things teenage boys will enjoy. But the constant stream of ideas is useful - saying ‘no’ to a suggestion is better than never having heard it.
Second, I seem to be addicted to making Claude artefacts. Here are just a few:
An interactive history of a key battle in Canadian history: Battle of the Plains of Abraham.
A maths multiplication quiz that my son and I played against each other while waiting for dinner one evening. Never let it be said we don’t know how to have fun…
A simplified version of the Myers-Briggs personality test to pass the time while we were waiting for a concert to start.
And many more…
Being able to create little interactive apps to explain concepts, history - or just entertain is amazing. Coding simple apps is now essentially free - the limit is your imagination. I’ve not counted, but I’m confident I’ve generated more code on this holiday than in decades of writing code as a software engineer. That’s a little sobering…
But the hallucinations remain. Some are easy to spot. For example take this walking map of Quebec City from ChatGPT. Initially it looks impressive. But you don’t need to know Quebec City to spot the multiple Place Royales, twin ferry terminals, mismatched scale, submerged Musee de la Civilization, confused direction arrows…
Then there’s this:
This is the lawn in front of Canada’s parliament in Ottawa. It’s used for the changing of the guard ceremony. ChatGPT has helpfully circled a good place to stand to watch the ceremony. And it’s right - it would be a good place. Because it’s right in the middle of where the military band stand. At best, attempting to stand there would result in a polite request to move; at worst you might get arrested.
Now you might be thinking - this isn’t surprising for ChatGPT. But even Claude gets stuck as well. Take this photo of a tank. There’s no information board. Could Claude work out what type it is?
It told me this:
But my human tank expert disagreed. They confidently told me it was a M24 Chaffee. So I asked Claude again:
Ah. Progress.
But let’s try another type to work out how confident Claude really is:
And then:
At least Claude realized it was struggling:
Enter Claudine
As our trip continued I found myself wondering - could Claude build a replacement for itself that included fact checking? Claude artefacts recently introduced support for using the Claude API within an artefact - so maybe this could work?
And it turns out it can. Enter Claudine - Claude’s more careful and self-aware sister (you’ll need to be logged into your Claude account to use this artefact).
There’s also a confidence information panel:
Ask about restaurants and you get a lower confidence reply:
Cool. So Claude can check its own homework. And a lack of memory and self-awareness makes it impartial.
But this approach relies on the models being able to spot errors. And that ability seems patchy. Take the walking map of Quebec City - what mistakes can they spot?
Claude spotted spelling mistakes but missed all the fundamental errors:
ChatGPT was similarly hopeless:
o3 did better, noticing multiple “Place Royales”. But then messed up by claiming the scale bar and route looked fine.
And so
Approaches like Claudine are a useful first step towards spotting errors. But the models still fail to spot errors that are trivially obvious to humans. Getting AIs to verify depends on spotting errors… But the best models available today aren’t able to.
In many ways current AI reminds me of my teenagers. Confident. Full of bravado. But occasionally wrong. The bravado and self-confidence hide a deeper lack of self-awareness of their limitations. Their weaknesses. And just like them AI is powerful enough to be useful but unreliable enough to be dangerous.
Ultimately AI tools need to mature. Learn their limitations. Be willing to admit them. We need tools that tell us when they are unsure. But that’s not where we are - yet.
For now our job is to be the parent. Keep a close eye on our digital teenagers. Be amazed by all the incredible things they can do. Learn their strengths and weaknesses. And maintain a healthy sense of scepticism.















