y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models

arXiv – CS AI|Yousra Fettach, Guillaume Bied, Hannu Toivonen, Tijl De Bie|
🤖AI Summary

Researchers benchmarked five frontier LLMs against human players in Cards Against Humanity games, finding that while models exceed random baseline performance, their humor preferences align poorly with humans but strongly with each other. The findings suggest LLM humor judgment may reflect systematic biases and structural artifacts rather than genuine preference understanding.

Analysis

This research exposes a fundamental gap between LLM capabilities and human-aligned decision-making in subjective domains. The study's core finding—that frontier models agree with each other far more than with humans—indicates that current LLMs may be developing their own internal preferences disconnected from human values. This matters because humor appears simple but actually requires understanding context, cultural nuance, social dynamics, and intent, making it a sophisticated test of genuine comprehension versus pattern matching.

The emphasis on position biases and content preferences suggests LLMs are leveraging shortcut heuristics rather than performing true semantic analysis. These systematic artifacts indicate that training processes may inadvertently reinforce superficial patterns that happen to work across training data but fail to capture human judgment. Humor alignment becomes a canary in the coal mine for broader alignment concerns: if models cannot align with human preferences in low-stakes cultural domains, questions arise about their reliability in higher-stakes applications.

For AI developers and organizations deploying LLMs in customer-facing roles, this research underscores the need for more rigorous human preference testing beyond standard benchmarks. The findings challenge assumptions that scaling and RLHF training automatically produce human-aligned systems. Organizations relying on LLMs for content generation, customer interaction, or decision support should conduct their own preference alignment testing rather than assuming frontier models understand context the way humans do. Future work must distinguish between whether alignment failures reflect training data limitations, architectural constraints, or fundamental limitations in how transformers process subjective cultural information.

Key Takeaways
  • Five frontier LLMs exceed random baseline in humor selection but show only modest alignment with human preferences across 9,894 game rounds.
  • Models demonstrate substantially higher agreement with each other than with humans, suggesting convergence on artificial preferences rather than genuine understanding.
  • Systematic position biases and content preferences indicate LLMs may rely on structural shortcuts rather than authentic semantic analysis of humor.
  • Humor alignment emerges as a meaningful benchmark for testing whether LLMs genuinely understand context or merely pattern-match training data.
  • Results raise concerns about broader alignment reliability in subjective domains where human judgment should guide AI decision-making.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles