AIBearisharXiv – CS AI · 18h ago7/10
🧠
LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs
Researchers introduce LGMT, a novel testing framework that uses first-order logic to evaluate Large Language Models' reasoning reliability by creating logically equivalent test cases. The study reveals that state-of-the-art LLMs fail consistency checks under semantic transformations, exposing hidden reasoning defects that traditional benchmarks miss.