AINeutralarXiv โ CS AI ยท 7h ago7/10
๐ง
Policy-Grounded Safety Evaluation of 20 Large Language Models
Researchers introduced Aymara AI, a programmatic platform for safety evaluation of large language models, testing 20 commercially available LLMs across 10 safety domains. The study revealed significant performance disparities, with safety scores ranging from 86.2% to 52.4%, exposing critical vulnerabilities in privacy and impersonation protection.