🧠 AI⚪ NeutralImportance 6/10

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

arXiv – CS AI|Gao Tianxi, Cai Yufan, Yuan Yusi, Dong Jin Song|March 6, 2026 at 05:00 AM

🤖AI Summary

Researchers introduce X-RAY, a new system for analyzing large language model reasoning capabilities through formally verified probes that isolate structural components of reasoning. The study reveals LLMs handle constraint refinement well but struggle with solution-space restructuring, providing contamination-free evaluation methods.

Key Takeaways

→X-RAY system uses formal probes to map LLM reasoning capabilities beyond simple task-level accuracy metrics.
→LLMs show asymmetric reasoning performance, handling constraint refinement better than solution-space restructuring.
→The framework can differentiate between models that appear similar on standard benchmarks.
→Formal calibration enables precise isolation of incremental structural information in reasoning tasks.
→The evaluation system is contamination-free and supports both training and testing of reasoning models.