AINeutralarXiv – CS AI · 14h ago6/10
🧠
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning
Researchers introduced UA-Legal-Bench, a five-task benchmark for evaluating large language models on Ukrainian legal reasoning using 99.5 million court decisions. The study reveals critical gaps in LLM evaluation for morphologically rich, non-Latin-script languages and demonstrates that standard accuracy metrics mask poor performance on imbalanced legal tasks.