AI models are affirming people’s worst behavior, even when other humans say they’re in the wrong, and users can’t get enough.
Of the 2,400 who participated in the study, most preferred being flattered. The number of test subjects more likely to use the sycophantic AI again was 13% higher compared with those who said they would return to the non-sycophantic chatbot, suggesting AI developers may have little incentive to change things up, according to the study.
The study found subjects exposed to just one affirming response to their bad behavior were less willing to take responsibility for their actions and repair their interpersonal conflicts while also making them more likely to believe they were right.
The study’s lead author, Stanford computer science PhD candidate Myra Cheng, said the results are worrying, especially for young people who, she noted, are turning to AI to try to solve their relationship problems.
To test human reactions to sycophantic AI, researchers studied the reactions of just over 2,400 human participants interacting with AI. First, 1,605 participants were asked to imagine they were the author of a post based on the AITA subreddit that was deemed wrong by other humans on the subreddit but deemed right by AI. The participants then either read the sycophantic AI response or a non-sycophantic response that was based on human feedback. Another 800 participants talked with either a sycophantic or non-sycophantic AI model about a real conflict in their own lives before being asked to write a letter to the other person involved in their conflict.
Participants who received validating AI responses were measurably less likely to apologize, admit fault, or seek to repair their relationships. Even when users recognize models as sycophantic, the AI’s responses still affect them, said the study’s co–lead author, Stanford computer science and linguistics professor Dan Jurafsky.
“What they are not aware of, and what surprised us, is that sycophancy is making them more self-centered, more morally dogmatic,” Jurafsky told Stanford Report.
Surprisingly, in the Stanford study, when the researchers asked the study’s human subjects to rate the objectiveness of both sycophantic and non-sycophantic AI responses, they rated them about the same, meaning it’s possible users could not tell the sycophantic model was being overly agreeable.
“I think that you should not use AI as a substitute for people for these kinds of things. That’s the best thing to do for now,” said Cheng.



