#reasoning-robustness News & Analysis

2 articles tagged with #reasoning-robustness. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBearisharXiv – CS AI · Jun 97/10

🧠

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs

Researchers introduce LGMT, a novel testing framework that uses first-order logic to evaluate Large Language Models' reasoning reliability by creating logically equivalent test cases. The study reveals that state-of-the-art LLMs fail consistency checks under semantic transformations, exposing hidden reasoning defects that traditional benchmarks miss.

AINeutralarXiv – CS AI · Jun 196/10

🧠

CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models

Researchers introduce CombEval, a dynamic benchmark framework for evaluating how well large language models handle combinatorial counting problems. Testing 11 LLMs reveals significant brittleness in handling ordered objects, indistinguishable elements, and nested dependencies, with code-augmented approaches showing modest improvements over direct reasoning.