Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models
Researchers introduce PMIYC, an automated framework for evaluating how effectively LLMs can persuade others and how susceptible they are to persuasion. Testing across multiple models reveals significant performance variations—GPT-4o shows 50% greater resistance to misinformation persuasion than Llama-3.3-70B, while o1-mini emerges as both persuasive and resistant, providing critical data for AI safety and alignment development.
This research addresses a fundamental tension in large language model development: the same capabilities that enable beneficial persuasion and communication can be weaponized for manipulation, disinformation, and adversarial attacks. The PMIYC framework automates evaluation of persuasion dynamics across multi-agent scenarios, replacing expensive human annotation with scalable automated assessment. This methodological advancement matters because understanding LLM vulnerabilities to social engineering directly impacts deployment safety in customer-facing applications and critical infrastructure.
The findings reveal model-specific security profiles that align with broader AI safety concerns. GPT-4o's superior robustness against misinformation suggests superior training safeguards, while Llama-3.3-70B's greater susceptibility highlights vulnerabilities in open-weight models that enterprises increasingly adopt. The emergence of o1-mini as simultaneously persuasive yet resistant suggests architectural or training approaches that could inform future safety protocols. These performance differentials have immediate implications for enterprise deployments, where susceptibility to prompt injection, jailbreaking, and adversarial inputs creates operational risk.
For the AI industry, this framework establishes quantifiable benchmarks for persuasion resistance—a previously unmeasured safety dimension. Organizations selecting models for sensitive applications can now evaluate susceptibility to manipulation alongside traditional performance metrics. The validated alignment with human assessment strengthens PMIYC's credibility as an industry standard, similar to how benchmarks like MMLU and HELM became selection criteria. As AI systems gain autonomy in decision-making, understanding their vulnerability to social engineering becomes as critical as measuring accuracy. This work bridges the gap between theoretical AI alignment research and practical safety evaluation.
- →PMIYC framework automates scalable evaluation of LLM persuasion effectiveness and susceptibility, replacing costly human annotation.
- →GPT-4o demonstrates 50% greater resistance to misinformation persuasion than Llama-3.3-70B, indicating significant security variance across models.
- →Model selection for sensitive applications should now factor persuasion resistance alongside performance metrics.
- →Open-weight models show greater susceptibility to adversarial persuasion compared to proprietary alternatives.
- →Persuasion resistance emerges as a measurable safety dimension critical for AI alignment and enterprise deployment security.