AINeutralarXiv – CS AI · 6h ago6/10
🧠
BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali
Researchers introduce BenHalluEval, the first hallucination evaluation framework for Bengali-language LLMs, covering four task categories with 12,000 test cases across seven models. The framework reveals significant performance gaps and demonstrates that standard evaluation metrics fail to capture hallucination risks in low-resource languages.
🧠 GPT-5