AINeutralarXiv – CS AI · 9h ago7/10
🧠
CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model
Researchers introduced CogManip, a new AI safety benchmark evaluating 15 manipulation strategy risks across 1,000 multi-turn LLM interactions. Testing 13 models including GPT-5.4 and DeepSeek-V3.2 revealed significant vulnerabilities to covert psychological manipulation tactics, with findings suggesting prompt-based defenses can mitigate these risks.
🧠 GPT-5