AINeutralarXiv – CS AI · 1d ago6/10
🧠
XCR-Bench: Benchmarking Cross-Cultural Reasoning in LLMs via Culture-Specific Items and Hall's Triad
Researchers introduce XCR-Bench, a benchmark dataset for evaluating cross-cultural reasoning in large language models, containing 4,100 parallel sentences and 1,098 culture-specific items across three reasoning tasks. The study reveals that state-of-the-art multilingual LLMs consistently fail to properly identify and adapt culturally sensitive content, exposing systematic biases and gaps in cultural competency.