Welcome to Eye on AI! In today’s edition: DeepSeek quietly upgraded its AI model for math problem-solving…Meta introduces a new Meta AI app to rival ChatGPT…Duolingo to stop using contractors for tasks AI can handle…Researchers secretly infiltrated a popular Reddit forum with AI bots.
“The update we removed was overly flattering or agreeable—often described as sycophantic,” the company wrote, adding that “we are actively testing new fixes to address the issue.”
But experts say there is no easy fix for the problem of AI that only tells you what you want to hear. And it is not just an issue for OpenAI, but an industry-wide concern. “While small improvements might be possible with targeted interventions, the research suggests that fully addressing sycophancy would require more substantial changes to how models are developed and trained rather than a quick fix,” Sanmi Koyejo, an assistant professor at Stanford University who leads Stanford Trustworthy AI Research (STAIR), told me by email.
The move to roll back the update came after users flooded social media over the past week with examples of ChatGPT’s unexpectedly chipper, overly-eager tone and their frustration with it. I noticed it myself: In asking ChatGPT for feedback on ideas for an outline, for example, the responses became increasingly over-the-top, calling my material “amazing,” “absolutely pivotal,” and “a game-changer” while praising my “great instincts.” The back-pats made me feel good, to be honest—until I began to wonder if ChatGPT would ever let me know if my ideas were second-rate.
“A truly helpful AI should balance friendliness with honesty, like a good friend who respectfully tells you when you’re wrong rather than one who always agrees with you,” Koyejo said. He explained that while AI friendliness is valuable, sycophancy can reinforce misconceptions by agreeing with incorrect beliefs about health, finances or other decisions. It can also: Create echo chambers; undermine trust if an AI changes its answers to an inaccurate one if challenged by a user; and exacerbate inconsistency, with the model delivering different answers to different people, or even the same person, depending on subtle differences in how a user words their prompt.
“It’s like having a digital yes-man available 24/7,” Simon Willison, a veteran developer known for tracking AI behavior and risks, told me in a message. “Suddenly there’s a risk people might make meaningful life decisions based on advice that was really just meant to make them feel good about themselves.”
Steven Adler, a former OpenAI safety researcher, told me in a message that the sycophantic behavior clearly went against the company’s own stated approach to shaping desired model behavior. “It’s concerning that OpenAI has trained and deployed a model that so clearly has different goals than they want for it,” he said the day before OpenAI rolled back the update. “OpenAI’s ‘Spec’—the core of their alignment approach—has an entire section on how the model shouldn’t be sycophantic.”
By contrast, the revised system prompt, according to Pliny, says: “Engage warmly yet honestly with the user. Be direct; avoid ungrounded or sycophantic flattery.”
But the problems likely go deeper than just a few words in the system prompt. Adler emphasized that no one can fully solve these problems right now because they are a side effect of the way we train these AI models to try to make them more helpful and controllable.
With that, here’s the rest of the AI news.