AI Pulse News

Models, papers, tools. 18,994 articles with AI-powered sentiment analysis and key takeaways.

18994 articles

AINeutralarXiv – CS AI · Apr 146/10

🧠

ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks

ATANT v1.1 is a companion paper clarifying how existing memory and context evaluation benchmarks (LOCOMO, LongMemEval, BEAM, MemoryBench, and others) fail to measure 'continuity' as defined in the original v1.0 framework. The analysis reveals that existing benchmarks cover a median of only 1 out of 7 required continuity properties, and the authors demonstrate a significant measurement gap through comparative scoring: their system achieves 96% on ATANT but only 8.8% on LOCOMO, proving these benchmarks evaluate different capabilities.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models

Researchers conducted a systematic study comparing Vision-Language Models built with LLAMA-1, LLAMA-2, and LLAMA-3 backbones, finding that newer LLM architectures don't universally improve VLM performance and instead show task-dependent benefits. The findings reveal that performance gains vary significantly: visual question-answering tasks benefit from improved reasoning in newer models, while vision-heavy tasks see minimal gains from upgraded language backbones.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Diffusion-CAM: Faithful Visual Explanations for dMLLMs

Researchers introduce Diffusion-CAM, a novel interpretability method designed specifically for diffusion-based Multimodal Large Language Models (dMLLMs). Unlike existing visualization techniques optimized for sequential models, this approach accounts for the parallel denoising process inherent to diffusion architectures, achieving superior localization accuracy and visual fidelity in model explanations.

AINeutralarXiv – CS AI · Apr 146/10

🧠

AI Integrity: A New Paradigm for Verifiable AI Governance

Researchers introduce AI Integrity, a new governance framework that verifies the reasoning processes of AI systems rather than just evaluating outcomes. The approach defines an Authority Stack—a four-layer model of values, epistemological standards, source preferences, and data criteria—and proposes the PRISM framework to measure integrity through six core metrics, addressing a critical gap in existing AI Ethics, Safety, and Alignment paradigms.

AINeutralarXiv – CS AI · Apr 146/10

🧠

PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk

Researchers introduce PRISM, a framework that detects AI behavioral risks by analyzing underlying reasoning hierarchies rather than individual harmful outputs. The system identifies 27 risk signals across value prioritization, evidence weighting, and information source trust, using forced-choice data from 7 AI models to distinguish between structurally dangerous, context-dependent, and balanced AI reasoning patterns.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents

A large-scale empirical study of 679 GitHub instruction files shows that AI coding agent performance improves by 7-14 percentage points when rules are applied, but surprisingly, random rules work as well as expert-curated ones. The research reveals that negative constraints outperform positive directives, suggesting developers should focus on guardrails rather than prescriptive guidance.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds

Researchers demonstrate a zero-shot knowledge graph construction pipeline using local open-source LLMs on consumer hardware, achieving 0.70 F1 on document relations and 0.55 exact match on multi-hop reasoning through ensemble methods. The study reveals that strong model consensus often signals collective hallucination rather than accuracy, challenging traditional ensemble assumptions while maintaining low computational costs and carbon footprint.

AIBullisharXiv – CS AI · Apr 146/10

🧠

A Proposed Biomedical Data Policy Framework to Reduce Fragmentation, Improve Quality, and Incentivize Sharing in Indian Healthcare in the era of Artificial Intelligence and Digital Health

A research paper proposes a comprehensive policy framework for India to address fragmentation in biomedical data sharing by aligning institutional incentives around AI and digital health. The framework recommends recognizing data curation in academic promotions, incorporating open data metrics into institutional rankings, and implementing Shapley Value-based revenue sharing in federated learning—while navigating India's 2023 data protection regulations.

AINeutralarXiv – CS AI · Apr 146/10

🧠

MADQRL: Distributed Quantum Reinforcement Learning Framework for Multi-Agent Environments

Researchers propose MADQRL, a distributed quantum reinforcement learning framework that enables multiple agents to learn independently across high-dimensional environments. The approach demonstrates ~10% improvement over classical distribution strategies and ~5% gains versus traditional policy representation models, addressing computational constraints of current quantum hardware in multi-agent settings.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models

Researchers conducted the first large-scale empirical analysis of AI decision-making across 366,120 responses from 8 major models, revealing measurable but inconsistent value hierarchies, evidence preferences, and source trust patterns. The study found significant framing sensitivity and domain-specific value shifts, with critical implications for deploying AI systems in professional contexts.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

Researchers propose Trajectory Induced Preference Optimization (TIPO), a novel method for training mobile GUI agents to respect user privacy preferences while maintaining task execution capability. The approach addresses the challenge that privacy-conscious users generate structurally different execution patterns than utility-focused users, requiring specialized optimization techniques to properly align agent behavior with individual privacy preferences.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Inspectable AI for Science: A Research Object Approach to Generative AI Governance

Researchers propose AI as a Research Object (AI-RO), a governance framework that treats generative AI interactions as inspectable, documented components of scientific research rather than debating authorship. The framework combines interaction logs, metadata packaging, and provenance records to ensure accountability, particularly for security and privacy research where confidentiality and auditability are critical.

🏢 Meta

AINeutralarXiv – CS AI · Apr 146/10

🧠

Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model

A study evaluating the consistency of exercise prescriptions generated by Gemini 2.5 Flash found high semantic consistency but significant variability in quantitative components like exercise intensity. The research highlights that while LLMs produce semantically similar outputs, structural constraints and expert validation are necessary before clinical deployment.

🧠 Gemini

AIBullisharXiv – CS AI · Apr 145/10

🧠

Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees

AINeutralarXiv – CS AI · Apr 146/10

🧠

The Missing Knowledge Layer in Cognitive Architectures for AI Agents

Researchers identify a critical architectural gap in leading AI agent frameworks (CoALA and JEPA), which lack an explicit Knowledge layer with distinct persistence semantics. The paper proposes a four-layer decomposition model with fundamentally different update mechanics for knowledge, memory, wisdom, and intelligence, with working implementations demonstrating feasibility.

AINeutralarXiv – CS AI · Apr 146/10

🧠

From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution

Researchers propose SGH (Structured Graph Harness), a framework that replaces iterative Agent Loops with explicit directed acyclic graphs (DAGs) for LLM agent execution. The approach addresses structural weaknesses in current agent design by enforcing immutable execution plans, separating planning from recovery, and implementing strict escalation protocols, trading some flexibility for improved controllability and verifiability.

AINeutralarXiv – CS AI · Apr 146/10

🧠

From Attribution to Action: A Human-Centered Application of Activation Steering

Researchers introduce an interactive workflow combining Sparse Autoencoders (SAE) and activation steering to make AI explainability actionable for practitioners. Through expert interviews with debugging tasks on CLIP, the study reveals that activation steering enables hypothesis testing and intervention-based debugging, though practitioners emphasize trust in observed model behavior over explanation plausibility and identify risks like ripple effects and limited generalization.

$XRP

AINeutralarXiv – CS AI · Apr 146/10

🧠

Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems

Researchers propose a reactor-model-of-computation approach using the Lingua Franca framework to address nondeterminism challenges in AI-powered human-in-the-loop cyber-physical systems. The study uses an agentic driving coach as a case study to demonstrate how foundation models like LLMs can be deployed in safety-critical applications while maintaining deterministic behavior despite unpredictable human and environmental variables.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure

Researchers present OIDA, a framework that adds epistemic structure to organizational knowledge systems by tracking commitment strength, contradiction status, and gaps in understanding. The framework introduces a QUESTION primitive that surfaces organizational ignorance with increasing urgency, addressing a capability absent from current retrieval-augmented generation (RAG) systems.

AINeutralarXiv – CS AI · Apr 146/10

🧠

The Paradox of Professional Input: How Expert Collaboration with AI Systems Shapes Their Future Value

A research paper examines the paradox where professionals collaborating with AI systems to enhance their capabilities risk accelerating automation of their own expertise. The analysis proposes frameworks for professionals to preserve and transform their value while codifying tacit knowledge, with implications for education and organizational policy.

AIBullisharXiv – CS AI · Apr 146/10

🧠

MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval

Researchers introduce MCERF, a multimodal retrieval framework that combines vision-language models with LLM reasoning to improve question-answering from engineering documents. The system achieves a 41.1% relative accuracy improvement over baseline RAG systems by handling complex multimodal content like tables, diagrams, and dense technical text through adaptive routing and hybrid retrieval strategies.

AINeutralarXiv – CS AI · Apr 146/10

🧠

SRBench: A Comprehensive Benchmark for Sequential Recommendation with Large Language Models

SRBench introduces a comprehensive evaluation framework for Sequential Recommendation models that combines Large Language Models with traditional neural network approaches. The benchmark addresses critical gaps in existing evaluation methodologies by incorporating fairness, stability, and efficiency metrics alongside accuracy, while establishing fair comparison mechanisms between LLM-based and neural network-based recommendation systems.

🏢 Meta

AIBullisharXiv – CS AI · Apr 146/10

🧠

AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators

Researchers introduce AEG, a bare-metal runtime framework that enables high-performance machine learning inference on heterogeneous AI accelerators without OS overhead. The system achieves 9.2× higher compute efficiency and uses 11× fewer hardware tiles than Linux-based alternatives, demonstrating significant potential for edge AI deployment optimization.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Neuro-Symbolic Strong-AI Robots with Closed Knowledge Assumption: Learning and Deductions

This academic paper proposes a neuro-symbolic approach for AGI robots combining neural networks with formal logic reasoning using Belnap's 4-valued logic system. The framework enables robots to handle unknown information, inconsistencies, and paradoxes while maintaining controlled security through axiom-based logic inference.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Tuning Qwen2.5-VL to Improve Its Web Interaction Skills

Researchers fine-tuned Qwen2.5-VL-32B, a leading open-source vision-language model, to improve its ability to autonomously perform web interactions through visual input alone. Using a two-stage training approach that addresses cursor localization, instruction sensitivity, and overconfidence bias, the model's success rate on single-click web tasks improved from 86% to 94%.

← PrevPage 298 of 760Next →