Persuasion
Title: Large Language Models Are More Persuasive Than Incentivized Human Persuaders Our project is a large-scale collaborative effort to study persuasion capabilities of frontier Large Language Models (LLMs). Our team consists of social scientists with a background in LLM research, AI researchers, as well as a host of additional and diverse profiles (such as linguistics, communication, engineering, etc). We have started designing experiments in early April 2024.
Abstract: We directly compare the persuasion capabilities of a frontier large language model (LLM; Claude Sonnet 3.5) against incentivized human persuaders in an interactive, real-time conversational quiz setting. In this preregistered, large-scale incentivized experiment, participants (quiz takers) completed an online quiz where persuaders (either humans or LLMs) attempted to persuade quiz takers toward correct or incorrect answers. We find that LLM persuaders achieved significantly higher compliance with their directional persuasion attempts than incentivized human persuaders, demonstrating superior persuasive capabilities in both truthful (toward correct answers) and deceptive (toward incorrect answers) contexts. We also find that LLM persuaders significantly increased quiz takers' accuracy, leading to higher earnings, when steering quiz takers toward correct answers, and significantly decreased their accuracy, leading to lower earnings, when steering them toward incorrect answers. Overall, our findings suggest that AI's persuasion capabilities already exceed those of humans that have real-money bonuses tied to performance. Our findings of increasingly capable AI persuaders thus underscore the urgency of emerging alignment and governance frameworks.
Goal and scope
The primary goal of this project is to provide a general, detailed, and extensive assessment of LLM persuasion capabilities in a high-external-validity context. This work includes evaluations of model capabilities, showing potential effects across model sizes and prompting efforts, as well as between languages.
Specifically, at present, our project is investigating LLM persuasion performance in an abstract and general quiz setting, where we compare LLM performance compared to human persuaders, replicating these results across models and languages like Chinese.
This project aims to deliver a strong understanding of current persuasion capabilities as well as a framework that can be applied readily and quickly once future iterations of frontier models are released. This continual updating of our research thus provides a repeatable assessment of persuasion capabilities that allows decision makers at different points in time to accurately estimate LLM risks that may arise from persuasion.
Head over there for information about what we're up to, our team, and (eventually) our outputs as they come out!