AINeutralarXiv โ CS AI ยท 5h ago1
๐ง
GLEAN: Grounded Lightweight Evaluation Anchors for Contamination-Aware Tabular Reasoning
Researchers propose GLEAN, a new evaluation protocol for testing small AI models on tabular reasoning tasks while addressing contamination and hardware constraints. The framework reveals distinct error patterns between different models and provides diagnostic tools for more reliable evaluation under limited computational resources.