y0news
AnalyticsDigestsSourcesRSSAICrypto
#model-brittleness1 article
1 articles
AIBearisharXiv โ€“ CS AI ยท 7h ago7/10
๐Ÿง 

Brittlebench: Quantifying LLM robustness via prompt sensitivity

Researchers introduce Brittlebench, a new evaluation framework that reveals frontier AI models experience up to 12% performance degradation when faced with minor prompt variations like typos or rephrasing. The study shows that semantics-preserving input perturbations can account for up to half of a model's performance variance, highlighting significant robustness issues in current language models.