AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers present a hybrid neuro-symbolic architecture that combines formal logic with neural semantic analysis to verify LLM outputs in high-stakes domains like healthcare. The system achieves over 83% hallucination detection rates for structured data and 72% for semantic fabrications while reducing report creation time by 30%, demonstrating practical safeguards for deploying LLMs in data-sensitive applications.
AIBullisharXiv – CS AI · Apr 207/10
🧠Researchers introduce AgentV-RL, an agentic verifier framework that enhances reward modeling for large language models by combining bidirectional reasoning agents with tool-use capabilities. The system addresses critical limitations in LLM verification by enabling forward and backward tracing of solutions, achieving 25.2% performance gains over existing methods and positioning agentic reward modeling as a promising new paradigm.
AIBullisharXiv – CS AI · Apr 157/10
🧠Researchers propose a two-stage LLM framework that uses one model to translate XAI technical outputs into natural language and a second model to verify accuracy, faithfulness, and completeness before delivering explanations to users. The framework includes iterative refinement mechanisms and demonstrates improved reliability across multiple XAI techniques and LLM families.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers present DeepSciVerify, an LLM-based system that verifies scientific claims against cited evidence by combining abstract-level analysis with selective full-text passage retrieval. The two-stage pipeline achieves 86.7% accuracy on benchmarks while reducing computational overhead by avoiding unnecessary full-text analysis in 67% of cases, addressing a critical reliability issue in AI-generated scientific content.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce LegalGraphRAG, a framework that combines hierarchical graph structures with multi-agent verification to improve legal reasoning in AI systems. The approach addresses critical limitations in applying retrieval-augmented generation to legal domains by organizing heterogeneous legal knowledge at multiple abstraction levels and implementing transparent, audited reasoning processes.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce EHR-ReasonCon, a benchmark dataset and EHR-Inspector, an LLM-based framework designed to verify consistency between unstructured clinical notes and structured data in Electronic Health Records. The work addresses a critical gap in healthcare data quality by moving beyond simple value matching to capture clinical reasoning, temporal relationships, and event interpretations that reflect real-world documentation practices.
AIBullisharXiv – CS AI · Apr 206/10
🧠Researchers demonstrate that LLMs can be used as lossless encoders and decoders for invertible problems in hardware design, significantly reducing hallucinations and omissions. By generating HDL code from Logic Condition Tables and reconstructing the original tables to verify accuracy, the approach improves developer productivity and catches both AI-generated errors and design specification flaws.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers analyzed how LLM verifiers assess solution correctness in test-time scaling scenarios, revealing that verification effectiveness varies significantly with problem difficulty, generator strength, and verifier capability. The study demonstrates that weak generators can nearly match stronger ones post-verification and that verifier scaling alone cannot solve fundamental verification challenges.
🧠 GPT-4