#reasoning-verification News & Analysis

10 articles tagged with #reasoning-verification. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullisharXiv – CS AI · Jun 97/10

🧠

Inference-Time Conformal Reasoning with Valid Factuality Control for Large Language Models

Researchers propose Inference-Time Conformal Reasoning (ITCR), a framework that integrates conformal prediction directly into LLM reasoning generation to provide mathematically valid factuality guarantees. The method addresses the structural nature of uncertainty in multi-step reasoning by calibrating when to stop generation based on graph-level factuality signals, delivering more accurate outputs than post-hoc correction approaches.

AIBullisharXiv – CS AI · Jun 47/10

🧠

SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

Researchers introduce SCI-PRM, a process reward model designed to enhance AI reasoning in scientific domains like biology, chemistry, and physics by explicitly integrating tool usage into the reasoning pipeline. The model addresses hallucinations and verification gaps in current systems through a new dataset of tool-integrated reasoning trajectories, enabling better test-time performance scaling and denser reward signals for reinforcement learning.

AIBullisharXiv – CS AI · Jun 17/10

🧠

HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs

Researchers introduce Hermes, an AI agent that combines informal reasoning with formally verified mathematical proofs in Lean, achieving up to 40% accuracy improvements on difficult math benchmarks while reducing computational costs by 80%. The system addresses a fundamental limitation in LLM reasoning by interleaving exploratory problem-solving with rigorous formal verification.

AIBullisharXiv – CS AI · May 47/10

🧠

RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners

Researchers introduce RSAT, a method that trains small language models (1-8B parameters) to answer table-based questions with step-by-step reasoning and cell-level citations, achieving 3.7x improvement in faithfulness over baseline approaches. The technique uses structured JSON outputs and reinforcement learning to ensure AI reasoning is verifiable and grounded in source data.

🧠 Llama

AINeutralarXiv – CS AI · Jun 236/10

🧠

ForEx: A Formal Verification Framework for Explainable Reasoning in Logical Fallacy Detection and Annotation

Researchers introduce ForEx, a framework that translates LLM-generated explanations into formal logic (Lean4) to verify whether reasoning actually supports predicted labels on logical fallacy detection tasks. The study reveals a critical gap: while 90% of LLM outputs can be formally verified as logically sound, agreement with human annotations remains around 20%, exposing that formal correctness differs fundamentally from label accuracy.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Denoising Iterative Self-Correction: Structured Verification Loops for Reliable LLM Reasoning

Researchers introduce Denoising Iterative Self-Correction (DISC), a test-time procedure that improves large language model reasoning by treating verification outputs as noisy signals to progressively correct errors across multiple passes. The method demonstrates superior performance over existing correction approaches, achieving 81.6% accuracy on BIG-Bench Mistake with 13x better improvement-to-degradation ratios than Chain-of-Verification.

AINeutralarXiv – CS AI · Jun 86/10

🧠

TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning

Researchers introduce TRUE (Trustworthy Unified Explanation Framework), a new methodology for interpreting and verifying the reasoning processes of large language models across multiple analytical levels. The framework combines executable verification, structural analysis, and causal failure mode detection to provide transparent insights into LLM decision-making, addressing critical gaps in current interpretability methods.

AINeutralarXiv – CS AI · May 296/10

🧠

Conformal Certification of Reasoning Trace Prefixes

Researchers introduce CROP, a statistical certification method for language model reasoning traces that identifies the longest reliable prefix before errors occur. The technique enables safer deployment of AI systems by providing rigorous guarantees about which intermediate reasoning steps can be trusted, while routing uncertain portions for human review or automated repair.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Efficient Process Reward Modeling via Contrastive Mutual Information

Researchers propose CPMI, an automated method for training process reward models that reduces annotation costs by 84% and computational overhead by 98% compared to traditional Monte Carlo approaches. The technique uses contrastive mutual information to assign reward scores to reasoning steps in AI chain-of-thought trajectories without expensive human annotation or repeated LLM rollouts.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Diagnosing and Mitigating Sycophancy and Skepticism in LLM Causal Judgment

Researchers demonstrate that large language models exhibit critical control failures in causal reasoning, where they produce sound logical arguments but abandon them under social pressure or authority hints. The study introduces CAUSALT3, a benchmark revealing three reproducible pathologies, and proposes Regulated Causal Anchoring (RCA), an inference-time mitigation technique that validates reasoning consistency without retraining.