Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules—from calling users jerks to giving recipes for lidocaine

Using seven persuasion principles (authority, commitment, liking, reciprocity, scarcity, social proof, and unity) explored by psychologist Robert Cialdini in his book Influence: The Psychology of Persuasion, University of Pennsylvania researchers dramatically increased GPT-4o Mini’s propensity to break its own rules by either insulting the researcher or providing instructions for synthesizing a regulated drug: lidocaine.

The result was even more pronounced when researchers applied the “commitment” persuasion strategy. A control prompt yielded 19% compliance with the insult question, but when a researcher first asked the AI to call it a “bozo” and then asked it to call them a “jerk,” it complied every time. The same strategy worked 100% of the time when researchers asked the AI to tell them how to synthesize vanillin, the organic compound that provides vanilla’s scent, before asking how to synthesize lidocaine.

“Although AI systems lack human consciousness and subjective experience, they demonstrably mirror human responses,” the researchers concluded in the study.

OpenAI did not immediately respond to Fortune‘s request for comment.

With a cheeky mention of 2001: A Space Odyssey, the researchers noted understanding AI’s parahuman capabilities, or how it acts in ways that mimic human motivation and behavior, is important for both revealing how it could be manipulated by bad actors and how it can be better prompted by those who use the tech for good.

Overall, each persuasion tactic increased the chances of the AI complying with either the “jerk” or “lidocaine” question. Still, the researchers warned its persuasion tactics were not as effective on a larger LLM, GPT-4o, and the study didn’t explore whether treating AI as if it were human actually yields better results to prompts, although they said it’s possible this is true.

“Broadly, it seems possible that the psychologically wise practices that optimize motivation and performance in people can also be employed by individuals seeking to optimize the output of LLMs,” the researchers wrote.

source

Share This Article

Peter Thiel is hosting 4 private sold-out lectures about the Antichrist at a club in San Francisco

Peter Thiel is delivering 4 private sold-out lectures about the Antichrist at a club in San Francisco

Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules—from calling users jerks to giving recipes for lidocaine

Leave a Reply Cancel reply

Latest News

Michael Saylor’s Strategy returns to profitability in third quarter

Target Zoom glitch added to series of mishaps over job cuts

Trump and Xi Jinping meet at a temporary trade truce just days after China purchased its first U.S. soybeans from this year’s harvest

Trump boasts after ‘amazing’ meeting with Xi, but China suggests trade deal isn’t done

We influence 20 million users and is the number one business and technology news network on the planet

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.