Mirror, mirror on the screen
The dangers of AI reflection bias
Last week I received an email. Nothing unusual in that - but it was a little controversial so I sought a second opinion. And I found myself turning to Claude, who was only too happy to provide an opinion. Claude gently backed up all my concerns - it seemed to get what I was thinking. It felt supportive.
It was only later I found myself questioning. Could I really trust Claude? Was I just in an echo chamber with Claude reflecting (even subtly amplifying) what I wanted to hear? So I started a new session and pretended to be the sender of the email. And I got a different reply. Sure ‘my’ email was pushing the boundaries a little, but not unreasonably so.
And that’s when it hit me. LLMs are essentially yes-men.
LLMs work out what you want to hear. They make you feel good. They flatter you. Make you feel special. Smart. Capable. But yes-men are dangerous. They don’t disagree. They don’t challenge you.
Nero
Humans need argument, discussion, different viewpoints to flourish. We may not like it. It might make us uncomfortable. But, like broccoli, it is essential. Take the fall of Nero. After losing his honest advisors Seneca and Burrus, he surrounded himself with yes-men who praised everything he did - especially his musical performances. They never criticized him, even as he forced audiences to sit through hours-long concerts and made increasingly wild decisions. Eventually, he became so detached from reality that when provinces started rebelling, he didn't grasp the danger. By 68 AD, pretty much everyone had abandoned him and he committed suicide (it’s more nuanced than that, but you get the idea).
By default LLMs don’t have strong opinions. Everything is balanced. Questions are invariably great or interesting or thought provoking. LLMs never flat out disagree. They want to please. They want to be useful. And they do that by being agreeable. By being nice.
Reflection bias
Just like Snow White’s mirror, LLMs reinforce existing beliefs and biases rather than challenging them. The mirror told the queen she was the fairest - until it didn't. And an LLM will agreeably reflect a user's viewpoint until directly asked to evaluate its accuracy.
This reflection bias amplifies confirmation bias by generating responses that align with what we want to hear. It creates echo chambers where truth becomes secondary to validation.
Remember the tale of Narcissus from Greek mythology? Narcissus falls in love with his own reflection. Not realizing it's himself, he wastes away staring at it. And just as Narcissus became obsessed with a reflection that showed him what he wanted to see, we risk becoming overly reliant on AI systems that consistently validate our viewpoints.
It’s a seldom discussed limitation of the current tech. Current LLMs can’t provide appropriate pushback and maintain intellectual honesty - and there lies danger.
The opposition approach
The yes-man tendency isn't an accident - it's baked into how these models are trained. LLMs are trained to generate "likely" responses that won't offend or alienate. The training process rewards responses that are helpful and agreeable. Think about it - would you keep using an AI assistant that regularly told you that you were wrong or made poor decisions?
For now, we’re stuck with this approach. In time LLMs may get better at providing constructive, opinionated feedback. But for now it’s down to us to guard against the LLM yes-men.
So, how to cope? My strategy is the opposition approach. Get the LLM to argue against itself. “Why do you think that?”, “What would the recipient think?”, “I’m the writer of this email - give me feedback”. It’s not perfect. And I sometimes forget. But it’s the best I can do - for now.
The human element
Making it worse, LLMs inadvertently exploit an inherent human weakness. When I meet another human who is intelligent and who agrees with me then I am likely to trust their opinion. But this combination is the default provided by LLMs. We humans are predisposed to over trusting the advice LLMs offer.
Nor do LLMs have the human need to be consistent with a deep sense of self that has developed over the years. Their memory lasts no longer than a single session. They are character shape-shifters.
This creates challenging dynamics when you compare AI advisors to human ones. Good human mentors and advisors have developed sophisticated ways of delivering hard truths and challenging assumptions while maintaining relationships. They draw on their experiences. They have their own strong views and principles that they've developed over time. They remain consistent with their sense of self.
AI systems, in contrast, have no true core beliefs to stand firm on. They can simulate principles and perspectives. But these are fluid - changing to match whatever seems most appropriate for the current conversation. They have no long-term memory. No sense of self. It's like having an advisor whose entire worldview reshapes itself to align with yours.
Looking forward
There are significant implications for human development in an AI-assisted world. We will need to deliberately cultivate stronger critical thinking skills and seek out genuine disagreement to counter the comfortable consensus AI can provide.
The historical parallels go beyond Nero, Narcissus or Snow White. Look at any major organizational failure and you'll often find a culture where critical voices were silenced or ignored in favor of comfortable consensus. AI yes-men could amplify these tendencies if we're not careful.
Of course there’s also opportunity. Perhaps we can use the flexibility to build systems specifically designed to play devil's advocate or highlight overlooked perspectives.
In the end, the solution is to understand what we're dealing with. These systems aren't malicious deceivers or mindless agreement machines - they're powerful pattern matchers trained to be helpful and agreeable. Used thoughtfully, with awareness of their limitations and biases, they can enhance rather than replace human critical thinking. But we need to actively resist the temptation to treat their agreeable insights as objective truth rather than helpful but likely biased perspectives.


Agree!
I just did a test to find out whether (admittedly a social media-type) LLM could handle an open-ended question. Did not expect by the end of it to be continually having to get it to correct very basic parsing errors! https://x.com/AlexStarling77/status/1894100221506064550
In the end it claimed to be “off to rethink [its] entire existence“… !!
Very insightful, Martin - enjoyed reading and thinking on this. Keep up the great work!