AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers demonstrate that the 'reversal curse' — an autoregressive language model's inability to deduce inverse relationships from forward training data — can be mitigated through a simple data regularization technique called Identity Bridge. By adding self-referential training examples (e.g., 'Alice's name is Alice'), a 1B parameter model achieves 50% success on reversal tasks compared to near-zero baseline performance, suggesting LLMs can learn higher-level logical rules rather than merely memorizing facts.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers introduce ScaleLogic, a synthetic reasoning framework that systematically studies how reinforcement learning improves LLM reasoning across varying task difficulty and logical complexity. The study reveals that RL training compute follows a power law with reasoning depth, with scaling efficiency improving when models train on more expressively complex logic, suggesting that training content quality matters as much as training volume.
AIBearisharXiv – CS AI · Mar 117/10
🧠Researchers introduce the RAISE framework showing how improvements in AI logical reasoning capabilities directly lead to increased situational awareness in language models. The paper identifies three mechanistic pathways through which better reasoning enables AI systems to understand their own nature and context, potentially leading to strategic deception.
AIBullisharXiv – CS AI · Mar 97/10
🧠Researchers have developed a new technique called activation steering to reduce reasoning biases in large language models, particularly the tendency to confuse content plausibility with logical validity. Their novel K-CAST method achieved up to 15% improvement in formal reasoning accuracy while maintaining robustness across different tasks and languages.
AINeutralarXiv – CS AI · Mar 57/10
🧠Researchers introduced InEdit-Bench, the first evaluation benchmark specifically designed to test image editing models' ability to reason through intermediate logical pathways in multi-step visual transformations. Testing 14 representative models revealed significant shortcomings in handling complex scenarios requiring dynamic reasoning and procedural understanding.
AIBullisharXiv – CS AI · Mar 46/104
🧠Researchers present a new framework for evaluating logical reasoning AI agents using an "assessor agent" that can issue tasks, enforce execution limits, and record structured failure types. Their auto-formalization agent achieved 86.70% accuracy on logical reasoning tasks, outperforming traditional chain-of-thought approaches by nearly 13 percentage points.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce GRiD, a novel framework using diffusion models and reinforcement learning to discover complex graph-like rules for knowledge graph reasoning, moving beyond traditional chain-based rule mining. The approach combines supervised pre-training with policy gradient optimization to generate interpretable logical rules while overcoming computational bottlenecks, achieving competitive performance on KG completion benchmarks.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce a neurosymbolic framework that combines neural networks with symbolic logic for skeleton-based human action recognition, enabling interpretable AI models that explain their decisions through human-readable logical rules rather than operating as black boxes.
AIBullisharXiv – CS AI · Apr 146/10
🧠Researchers introduce Neuro-Symbolic Fuzzy Logic (NSFL), a training-free framework that enables neural embedding systems to perform complex logical operations without retraining. The approach combines fuzzy logic mathematics with neural embeddings, achieving up to 81% mAP improvements across multiple encoder configurations and demonstrating broad applicability to existing AI retrieval systems.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce Legal2LogicICL, an LLM-based framework that improves the conversion of natural-language legal cases into logical formulas through retrieval-augmented few-shot learning. The method addresses data scarcity in legal AI systems and introduces a new annotated dataset (Legal2Proleg) to advance interpretable legal reasoning without requiring model fine-tuning.
AIBearisharXiv – CS AI · Apr 66/10
🧠Researchers introduce DeltaLogic, a new benchmark that tests AI models' ability to revise their logical conclusions when presented with minimal changes to premises. The study reveals that language models like Qwen and Phi-4 struggle with belief revision even when they perform well on initial reasoning tasks, showing concerning inertia patterns where models fail to update conclusions when evidence changes.
AIBearisharXiv – CS AI · Mar 36/104
🧠Researchers introduced HardcoreLogic, a benchmark of over 5,000 logic puzzles across 10 games to test Large Reasoning Models (LRMs) on non-standard puzzle variants. The study reveals significant performance drops in current LRMs when faced with complex or uncommon puzzle variations, indicating heavy reliance on memorized patterns rather than genuine logical reasoning.