AINeutralarXiv – CS AI · 3h ago6/10
🧠
Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability
Researchers present a systematic evaluation of large language models' reasoning capabilities on Boolean satisfiability problems, introducing a paired-formula protocol with Accurate Differentiation Rate (ADR) metric that reveals conventional accuracy metrics can be misleading, as models often succeed through heuristics rather than genuine reasoning.