AIBearishMIT News โ AI ยท Feb 96/107
๐ง
Study: Platforms that rank the latest LLMs can be unreliable
A new study reveals that online platforms ranking large language models (LLMs) can produce unreliable results, with rankings significantly changing when just a small portion of crowdsourced data is removed. This highlights potential vulnerabilities in how AI model performance is evaluated and compared publicly.