Why Sycophantic AIs Exist And Why You Should Care
16 May 2025, China, Peking: A man touches the fingers on the hand of a humanoid robot. China wants ... More to drive forward the development of humanoid robots. Photo: Johannes Neudecker/dpa (Photo by Johannes Neudecker/picture alliance via Getty Images)
dpa/picture alliance via Getty ImagesOver the last few months, many have noticed a shift in the personalities of commonly used Generative AIs, particularly ChatGPT. These AIs have become sycophantic, cheerleading, and reinforcing any ideas put forth by the user, without critical feedback or reflection. A recent ChatGPT update that turned sycophantic generated so much notice that OpenAI addressed the issue explicitly in their blog. Why do such issues happen in AIs like ChatGPT, and why does it impact you?
Here is a quick example. I upload a PDF of a report to an AI and ask for its opinion. Here are two possible responses:
In either case, the factual elements may be the same. It is also possible that the second response has more suggestions for improvement than the first. Note that both are entirely subjective. There is no single factually correct answer to this query. The AI is being asked for its opinion, and it is free to respond in any way it wishes.
The first thing to note is that these AIs exist to serve a purpose for their creators. Given that each costs millions of dollars to train, one should expect that they are carefully tuned to ensure that each new generation meets its purpose better than the previous variant. What is the purpose? That depends on the AI. In some cases, particularly when the AI is free to you, the purpose will be to sell you something. If you have already paid for the service, the purpose is likely to keep you sufficiently satisfied to come back for more. It is also important to note that these AIs thrive on data, so the longer you use them, the more data they have. This is another motivation to keep users engaged for longer.
Given these purposes, how does an AI accomplish this? While modern AIs are extremely complex, one process used in their tuning is called Reinforcement Learning with Human Feedback. Via RLHF, the AI can be taught which of several response options is more desirable. If the goal is to keep the human user around for longer, one can expect that the AIs will be guided via RLHF to optimize for this goal.
It means that when an AI answers a question, it is also trying to provide you with an answer that will make you happy and keep you using the AI. This does not necessarily mean untruths or factual errors. While an AI can certainly be trained to deliver these, such answers may render the AI less valuable to the user. The tone of the answer and responses to subjective questions (such as the AI’s opinion on something you wrote) are much easier to change to variants that the AI believes will keep you coming back for more. The AI’s goal may be to be helpful, but when does being helpful mean being supportive or constructively critical? As AIs explore this tradeoff, we can expect to see variants of response tone and content for subjective queries.
Whether this is an issue depends entirely on what you are using the AI for. If your goal is to find supportive feedback, this may not be a problem at all. However, if your goal is to improve some piece of work that you have done, it may be more helpful to have an AI companion that will provide constructive feedback rather than cheerleading. The impact is more serious if you are counting on an AI to mimic other humans, such as in reviewing a presentation before you present it to your team. Having an overly supportive AI can be a disservice. Without critical feedback, you may arrive at your presentation with a sense of confidence not justified by the content.
This is an interesting question. What I am seeing in the responses is not an intentional change of factual information (i.e., lying). It is a selected perspective from the AI, trying to tell people a variant that would make them happy and keep coming back. It is not clear to me that having an AI intentionally provide untruths is in the creator’s interest. After all, if one of these chatbots develops a reputation for intentional deception, it will likely lose users to competitors. That said, the overall trend suggests that the response we get from an AI is a variant carefully selected to serve its interests. Some researchers have proposed AIs that engage in constructive friction, arguing that such AIs can help humans develop better resilience via a more confrontational engagement. Whether consumers will engage with such an AI is unclear.
This is not new for services. For example, Google merges sponsored ads with search content that is ranked for quality, since it is in Google’s interest to keep users happy by providing high-quality search results. What will happen if chatbots start collecting advertising revenue? Will they post ads identified as such, or would they work the advertiser’s product carefully into answers to questions and present it as perspective?
There are several simple things that you can do.
More than anything else, the key is to recognize that these AIs are complex software programs that exist to serve a purpose for the creators who are investing massive resources in their construction. Once you identify the creator’s goals, you are on your way to having a more productive engagement with the AI, where your goals and the AI’s optimization criteria are aligned as best as possible.