AI Pulse News

Models, papers, tools. 18,082 articles with AI-powered sentiment analysis and key takeaways.

18082 articles

AINeutralarXiv – CS AI · 3d ago6/10

🧠

To Build or Not to Build? Factors that Lead to Non-Development or Abandonment of AI Systems

A research paper investigates factors that lead organizations to abandon AI systems during development or post-deployment, finding that ethical concerns represent only one of six drivers. The study reveals that practical constraints—including resource limitations, organizational dynamics, and regulatory pressures—often outweigh ethical considerations in non-development decisions, suggesting responsible AI research should broaden its focus beyond ethics-centric approaches.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

Researchers introduce TopBench, a benchmark dataset of 779 samples designed to evaluate how well Large Language Models handle implicit prediction tasks over tabular data—queries requiring inference from historical patterns rather than simple data retrieval. Testing reveals current LLMs struggle with intent recognition and default to lookup-based approaches, indicating that accurate intent disambiguation is critical before predictive reasoning can succeed.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles

Researchers present a neuro-symbolic framework that combines first-order logic, causal models, and deep reinforcement learning to automatically synthesize, verify, and maintain safety-critical rule-based systems. The system uses LLMs to translate human-specified legal and safety principles into formal logical rules, with validation pipelines ensuring consistency and safety before deployment in autonomous systems.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures

Researchers introduce DEFault++, an AI diagnostic system that automatically detects, categorizes, and identifies root causes of faults in transformer neural networks across 45 different failure mechanisms. The tool achieves over 96% accuracy in fault detection and demonstrates practical value in helping developers fix issues correctly 46% more often than without assistance.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

Researchers introduce PRISM, a three-stage training pipeline that addresses distributional drift in large multimodal models by inserting a distribution-alignment stage between supervised fine-tuning and reinforcement learning. The method uses a Mixture-of-Experts discriminator to correct perception and reasoning errors, achieving 4.4-6.0 percentage point improvements on multimodal benchmarks compared to standard SFT-to-RLVR approaches.

🧠 Gemini

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Imitation Game for Adversarial Disillusion with Chain-of-Thought Reasoning in Generative AI

Researchers propose a novel defense framework against adversarial attacks on AI systems using chain-of-thought reasoning and multimodal generative agents. The approach, based on an 'imitation game' paradigm, successfully neutralizes both deductive and inductive adversarial illusions across white-box and black-box attack scenarios, addressing a critical vulnerability in modern AI systems.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Chronology of Multi-Agent Interactions for Provenance of Evolving Information

Researchers propose a novel system for tracking provenance in multi-agent AI systems by creating chronological records of contributions during content generation. The approach uses 'symbolic chronicles'—timestamped records similar to forensic chain-of-custody documentation—enabling attribution without relying on internal memory or external metadata, addressing accountability challenges in collaborative AI.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

AI Models for Depressive Disorder Detection and Diagnosis: A Review

A comprehensive review of 55 studies examines AI methods for detecting and diagnosing Major Depressive Disorder, revealing trends toward graph neural networks for brain connectivity analysis, large language models for linguistic data, and multimodal fusion approaches. The survey highlights how AI can address the subjectivity in clinical depression diagnosis while advancing computational psychiatry through improved explainability and fairness.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI

Researchers have published a comprehensive survey on Physical AI that bridges the gap between physical perception and symbolic physics reasoning in AI systems. The work advocates for next-generation world models that integrate physical laws, embodied reasoning, and generative approaches to create AI systems with genuine understanding of physical phenomena rather than pure pattern recognition.

AIBullisharXiv – CS AI · 3d ago6/10

🧠

GAVEL: Towards Rule-Based Safety Through Activation Monitoring

Researchers introduce GAVEL, a rule-based activation monitoring framework that enhances large language model safety by modeling neural activations as interpretable cognitive elements rather than broad behavioral classifiers. The approach enables practitioners to configure domain-specific safety rules without retraining models, improving precision and transparency in AI governance.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

From Competition to Collaboration: Designing Sustainable Mechanisms Between LLMs and Online Forums

Researchers propose a framework for sustainable collaboration between Large Language Models and online Q&A forums, addressing how GenAI systems can incentivize knowledge contributions while depending on forum data for training. Using Stack Exchange data and simulations, the study demonstrates that despite inherent incentive misalignment between AI providers and human communities, collaborative mechanisms can achieve meaningful utility for both parties.

AIBearisharXiv – CS AI · 3d ago6/10

🧠

Epistemic reflections on AI answering our questions: overwatch, erudite, logician, interlocutor

A research paper examines epistemological risks in relying on large language models for critical advice in finance, law, and healthcare. The article argues that uncritical acceptance of AI outputs violates established principles of logical reasoning and fair judgment, and proposes that trustworthy AI systems require integrated inference capabilities and awareness of how human biases shape interpretation.

🏢 Meta

AIBearisharXiv – CS AI · 3d ago6/10

🧠

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

Researchers challenge the conventional wisdom that large language models contain significant redundant parameters, demonstrating that small-magnitude weights encode crucial knowledge for difficult downstream tasks. The study reveals that pruning these weights causes irreversible performance degradation that cannot be recovered through continued training, with effects monotonically correlated to task difficulty.

AIBullisharXiv – CS AI · 3d ago6/10

🧠

General Uncertainty Estimation with Delta Variances

Researchers present Delta Variances, a computationally efficient method for estimating epistemic uncertainty in neural networks without requiring architectural changes or retraining. The technique shows competitive results with minimal computational overhead, demonstrated on a weather simulation task, offering practical uncertainty quantification for large-scale machine learning models.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

From Test-taking to Cognitive Scaffolding: A Pedagogical Diagnostic Benchmark for LLMs on English Standardized Tests

Researchers introduce ESTBook, a pedagogical diagnostic benchmark containing 10,576 multimodal questions across five major English standardized tests, designed to evaluate whether large language models can exhibit faithful reasoning and identify student misconceptions rather than just achieving binary accuracy scores. The framework moves beyond traditional test-taking benchmarks by enriching questions with cognitive reasoning trajectories and distractor rationales, enabling better assessment of LLM capabilities as educational tutoring tools.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Efficient Preimage Approximation for Neural Network Certification

Researchers introduce PREMAP2, an advanced neural network certification tool that significantly improves scalability and efficiency for verifying AI model robustness. The method extends beyond worst-case analysis by estimating what proportion of inputs satisfy safety specifications, with new capabilities supporting convolutional networks and real-world adversarial scenarios like patch attacks.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Researchers introduce FinChain, a new benchmark dataset designed to evaluate chain-of-thought reasoning in financial AI systems. The dataset addresses gaps in existing finance benchmarks by emphasizing verifiable intermediate reasoning steps rather than just final answers, and reveals that even leading LLMs struggle with multi-step symbolic financial reasoning.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs

Researchers introduce VISE, the first benchmark for evaluating sycophancy in video large language models (Video-LLMs), where models incorrectly agree with user inputs that contradict visual evidence. The study proposes two training-free mitigation strategies: enhanced visual grounding through keyframe selection and inference-time neural representation steering, addressing a critical reliability gap in multimodal AI systems.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

EXPO: Stable Reinforcement Learning with Expressive Policies

Researchers introduce EXPO, a reinforcement learning algorithm that trains expressive policies (like diffusion models) more efficiently by avoiding direct value optimization. The method uses a lightweight Gaussian policy to edit actions from a base policy, achieving 2-3x improvements in sample efficiency for both offline-to-online and fine-tuning scenarios.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Tell-Tale Watermarks for Explanatory Reasoning in Synthetic Media Forensics

Researchers have developed a watermarking system called 'tell-tale watermarks' to detect and trace the chain of transformations applied to synthetic media, addressing forensic challenges posed by AI-generated and edited digital content. The system leaves interpretable traces under image manipulations, enabling investigators to reconstruct the generation history of potentially fabricated media.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

ML Code Smells: From Specification to Detection

Researchers introduce SpecDetect4ML, a specification-driven tool that detects code smells in machine learning pipelines using Code Property Graphs. The tool identifies 22 types of recurring implementation patterns that compromise reproducibility, robustness, and maintainability, achieving 95.82% precision and 88.14% recall—significantly outperforming existing static analysis tools.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression

Researchers introduce Vanishing Contributions (VCON), a unified framework for compressing deep neural networks through gradual parallel execution of original and compressed models. The technique demonstrates 1-15% accuracy improvements across vision and NLP tasks compared to existing compression methods.

AIBullisharXiv – CS AI · 3d ago6/10

🧠

Mixed Precision Training of Neural ODEs

Researchers present a mixed precision training framework for neural ODEs that reduces memory usage by ~50% and achieves up to 2x speedup while maintaining accuracy. The approach uses low-precision computations for velocity evaluations and intermediate states while preserving high precision for weights and gradient accumulation, addressing computational and memory bottlenecks in continuous-time neural network architectures.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents

Researchers introduce TiMem, a temporal-hierarchical memory framework that helps conversational AI agents manage long conversation histories beyond LLM context limits. The system organizes interactions through a Temporal Memory Tree, achieving state-of-the-art performance on memory recall benchmarks while reducing memory overhead by over 50%.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

Researchers introduce RPC-Bench, a large-scale benchmark containing 15,000 human-verified question-answer pairs designed to evaluate how well AI models understand research papers. Testing reveals that even the strongest models like GPT-5 achieve only 68.2% accuracy on comprehension tasks, dropping significantly when conciseness is factored in, exposing critical gaps in academic document understanding.

🧠 GPT-5

← PrevPage 210 of 724Next →