y0news
#cross-cultural1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 6h ago1
๐Ÿง 

LiveCultureBench: a Multi-Agent, Multi-Cultural Benchmark for Large Language Models in Dynamic Social Simulations

Researchers introduce LiveCultureBench, a new benchmark that evaluates large language models as autonomous agents in simulated social environments, testing both task completion and adherence to cultural norms. The benchmark uses a multi-cultural town simulation to assess cross-cultural robustness and the balance between effectiveness and cultural sensitivity in LLM agents.