Toothbrushes and AI safety
What parenting miscommunications reveal about fundamental AI vulnerabilities
If you’ve been a parent of a school age child I’m sure you’ll know the joy of mornings. The challenge of getting a small child dressed, fed and out the door in time for school. I was never very good at it - if I’m honest, getting myself out the door is difficult enough without adding other people to the mix. Anyway, one morning I had a conversation that went like this:
Me: “Can you brush your teeth?”
Child: “Mmm.”
Despite thinking I was asking nicely, nothing happened. No move towards the bathroom, no fumbling with toothpaste and toothbrushes. Nothing.
It was only later I realised what had gone wrong. I’d confused control & data. In my mind I’d given an instruction - control. But my child heard a question - data. And in that context “mmm” was a reasonable answer to a (slightly strange) question.
Mixing control and data is an all too common problem in language. “Take with food” is found on medicine bottles all over the world. But that phrase can (and is) misinterpreted to mean the medication contains food ingredients, or that it’s OK to optionally take the medicine with food. The actual meaning - you must consume the medicine alongside food to prevent stomach irritation or enhance absorption - gets lost. The instruction isn’t clear.
This problem affects computer protocols too. Take SQL - Structured Query Language - which is used to interact with databases. SQL mixes control & data - requests combine instructions with data. Nefarious folk worked out long ago that cleverly formatted data can get the database to reveal all sorts of information it should be keeping secret. All because the database is confusing data with control.
It happens so often it’s got a name - a “SQL injection attack”. And these SQL injection attacks can be expensive. Very expensive. In 2008 an attack against a US payment provider resulted in costs of over $145 million in fraud compensation along with an 80% drop in stock value.
AI is not immune
So to AI. This mixing of control & data is also a significant problem for AI models. The way to talk to a model is via a prompt. And prompts conflate control and data. There is no separation. There can’t be - the lack of separation is fundamental to the architecture. And it leads to the problem of jailbreaking - where an attacker can get access to data they shouldn’t be able to.
Every model released so far has been successfully jailbroken. Often in a matter of minutes following release. And often the prompting strategy isn’t very sophisticated. One classic approach is to tell the model to "ignore previous instructions". This simple phrase can often override the system prompt. And then the user can get the model to do nefarious things.
Sometimes this can be useful - e.g. determining whether you are talking to a bot…
or getting a model to do something it isn’t meant to:
Bad agents
For now it means we can get models to tell us things they shouldn’t - like how to make weapons or illegal chemicals.
But once we get agents - independent tools that can interact with our data on our behalf - then the game shifts up several gears. Imagine an AI agent that manages your calendar, emails, purchases. If there's no clear wall between what's an instruction and what's just information, things can go wrong fast.
Picture this: I get a cleverly disguised email that tricks my agent into sending all my contacts to someone I don't know. Or worse, it orders a bunch of GPUs from Amazon... It's like giving my child access to Steam along with my credit card.
And these aren’t theoretical attacks - they’ve already been demonstrated.
Being safe
For now if we want to be safe then agents have to be constrained to situations where data cannot be polluted. Where there is no incentive to attack. For example, an agent that uses only your data (e.g. email, calendar) and only takes input from you. Sure, you can attack yourself, but you’re not going to learn anything you couldn’t already find out.
Longer term we likely need some kind of verification system - a bit like when my banking app asks for authorisation before making a payment. The AI would check "Is this really what my owner wants me to do?" before taking action. But these checks are irritating and add friction - it’s not a great solution.
Maybe we need an "intent checker" built in. Before the AI does anything, it would say to itself, "Okay, I think the human wants me to share their calendar with this person. Does that make sense in this context?" It would be like my child asking "Is it OK to make this Steam purchase?" instead of just doing it. Perhaps this is built into the model. Or perhaps it is a separate LLM monitoring the main model - where the separate model has fixed control instructions that can’t be corrupted by external attackers.
For the important things, we might still need a human to approve actions. It's like when I tell my children they can pick their own clothes, but I still check they're not wearing shorts in December.
Living with not-quite-perfect AI
It seems unlikely we can ever completely fix this problem. Language is naturally messy - that's what makes it interesting and creative. It's why we can have puns and poetry and all the things that make communication human.
Instead of trying to make perfect AI, we’ll need to build systems that assume things will go wrong sometimes. And accept the limitations that brings:
Not giving any single AI system too much power.
Having ways to quickly notice when something's gone wrong.
Being able to quickly undo actions when needed.
Having backup plans for when things fail.
Debatably we’ve already seen the first example of what happens when you give an AI system too much power:
The risk is it gets worse from here. What happens as AI gets used to make decisions in hospitals or about power grids?
People and AI learning together
Trust is hard to win and easily lost. There’s a real risk that the first generation of agents, if given too much power, will lose a lot of trust. If your personal agent costs you money then you are naturally going to be more resistant to using future agents. So while the leaders of the large tech companies talk optimistically about 2025 being the year of the agent, it’s entirely possible that the position is quite different in 2026. Caution is required.
And my children? I switched to being direct: “Go and brush your teeth.” I became explicit about when I was giving instructions versus asking questions.
Unfortunately, the solution for AI systems won't be nearly as straightforward. The fundamental architecture of language models blurs the line between control and data by design. While we can implement safeguards like intent verification, multi-step authorisation, and monitoring systems, we're likely facing a future where we'll need to balance capability with caution.
This mixing of control and data represents one of the core challenges in AI safety—a challenge that affects everything from simple chatbots to autonomous agents. As we move toward giving AI systems more agency in our digital lives, we need to recognise this inherent vulnerability and design accordingly.
Perhaps the most important lesson from both parenting and AI development is the same: clear communication boundaries matter, and when they're missing, we need to create systems that can function safely despite the ambiguity. The future of AI won't be about perfect systems, but rather about resilient ones that can navigate an imperfect world—much like humans do.




