AINeutralarXiv – CS AI · 7h ago6/10
🧠
Geometric Metrics and LLMs: What They Measure and When They Work
Researchers systematically tested geometric metrics for evaluating large language models, finding that several popular metrics like Schatten Norm and MOM primarily measure output length rather than quality. While geometric metrics add modest discriminative value beyond standard text statistics for tasks like generator identification, they show inconsistent correlation with actual text quality measures.