AIBearisharXiv โ CS AI ยท 10h ago6/10
๐ง
Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
Researchers introduce OmniBehavior, a benchmark for evaluating large language models' ability to simulate real-world human behavior across complex, long-horizon scenarios. The study reveals that current LLMs struggle with authentic behavioral simulation and exhibit systematic biases toward homogenized, overly-positive personas rather than capturing individual differences and realistic long-tail behaviors.