AINeutralarXiv โ CS AI ยท 10h ago6/10
๐ง
Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models
Researchers benchmarked five frontier LLMs against human players in Cards Against Humanity games, finding that while models exceed random baseline performance, their humor preferences align poorly with humans but strongly with each other. The findings suggest LLM humor judgment may reflect systematic biases and structural artifacts rather than genuine preference understanding.