🧠

AI

12,704 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12704 articles

AINeutralarXiv – CS AI · Apr 156/10

🧠

A longitudinal health agent framework

Researchers propose a multi-layer AI agent framework designed to support longitudinal health tasks over extended periods, addressing critical gaps in current implementations around user intent, accountability, and sustained goal alignment. The framework emphasizes adaptation, coherence, continuity, and agency across repeated interactions, offering guidance for developing safer, more personalized health AI systems that move beyond isolated interventions.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Memory as Metabolism: A Design for Companion Knowledge Systems

A new research paper proposes a governance framework for personal AI memory systems designed to function as 'companion' knowledge wikis that mirror user knowledge while compensating for epistemic failures like entrenchment and evidence suppression. The work addresses an emerging 2026 landscape of memory architectures for large language models through five operational mechanisms (TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, AUDIT) aimed at preventing user-coupled drift in single-user knowledge systems.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Human-Inspired Context-Selective Multimodal Memory for Social Robots

Researchers have developed a context-selective, multimodal memory system for social robots that mimics human cognitive processes by prioritizing emotionally salient and novel experiences. The system combines text and visual data to enable personalized, context-aware interactions with users, outperforming existing memory models and maintaining real-time performance.

AINeutralarXiv – CS AI · Apr 156/10

🧠

LLM-HYPER: Generative CTR Modeling for Cold-Start Ad Personalization via LLM-Based Hypernetworks

LLM-HYPER is a new framework that uses large language models as hypernetworks to generate click-through rate prediction models for cold-start ads without traditional training. The system achieved a 55.9% improvement over baseline methods in offline tests and has been successfully deployed in production on a major U.S. e-commerce platform.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks

Researchers introduce Spatial Atlas, a compute-grounded reasoning system that combines deterministic spatial computation with large language models to create spatial-aware research agents. The framework demonstrates competitive performance on two benchmarks—FieldWorkArena for multimodal spatial question-answering and MLE-Bench for machine learning competitions—while improving interpretability by grounding reasoning in structured spatial scene graphs rather than relying on hallucinated outputs.

🏢 OpenAI🏢 Anthropic

AINeutralarXiv – CS AI · Apr 156/10

🧠

The A-R Behavioral Space: Execution-Level Profiling of Tool-Using Language Model Agents in Organizational Deployment

Researchers introduce a new behavioral measurement framework for tool-augmented language models deployed in organizations, using a two-dimensional Action Rate and Refusal Signal space to profile how LLM agents execute tasks under different autonomy configurations and risk contexts. The approach prioritizes execution-layer characterization over aggregate safety scoring, revealing that reflection-based scaffolding systematically shifts agent behavior in high-risk scenarios.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching

Researchers introduce SLATE, a large-scale benchmark for evaluating AI agents using APIs, and propose Entropy-Guided Branching (EGB), a search algorithm that improves task success rates and computational efficiency. The work addresses critical limitations in deploying language models within complex tool environments by establishing rigorous evaluation frameworks and reducing the computational burden of exploring massive decision spaces.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents

Aethon is a new systems primitive that enables stateful AI agents to be instantiated in near-constant time by using reference-based replication instead of full materialization. This architectural innovation addresses latency and memory overhead constraints in existing AI runtime systems, making it possible to spawn, specialize, and govern agents at production scale.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Beyond Factual Grounding: The Case for Opinion-Aware Retrieval-Augmented Generation

Researchers propose Opinion-Aware Retrieval-Augmented Generation (RAG) to address a critical bias in current LLM systems that treat subjective content as noise rather than valuable information. By formalizing the distinction between factual queries (epistemic uncertainty) and opinion queries (aleatoric uncertainty), the team develops an architecture that preserves diverse perspectives in knowledge retrieval, demonstrating 26.8% improved sentiment diversity and 42.7% better entity matching on real-world e-commerce data.

AINeutralarXiv – CS AI · Apr 156/10

🧠

EMBER: Autonomous Cognitive Behaviour from Learned Spiking Neural Network Dynamics in a Hybrid LLM Architecture

Researchers present EMBER, a hybrid architecture combining spiking neural networks with large language models where the SNN acts as a persistent, biologically-inspired memory substrate that autonomously triggers LLM reasoning. The system demonstrates emergent autonomous behavior, initiating unprompted user contact after learning associations during idle periods, suggesting a fundamental shift in how AI systems could coordinate cognition and action.

AINeutralarXiv – CS AI · Apr 156/10

🧠

TRUST Agents: A Collaborative Multi-Agent Framework for Fake News Detection, Explainable Verification, and Logic-Aware Claim Reasoning

TRUST Agents is a multi-agent AI framework designed to improve fake news detection and fact verification by combining claim extraction, evidence retrieval, verification, and explainable reasoning. Unlike binary classification approaches, the system generates transparent, human-inspectable reports with logic-aware reasoning for complex claims, though it shows that retrieval quality and uncertainty calibration remain significant challenges in automated fact verification.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Modality-Native Routing in Agent-to-Agent Networks: A Multimodal A2A Protocol Extension

Researchers demonstrate that MMA2A, a multimodal routing protocol for agent-to-agent networks, achieves 52% task accuracy versus 32% for text-only baselines by preserving native modalities (voice, image, text) across agent boundaries. The 20-percentage-point improvement requires both protocol-level native routing and capable downstream reasoning agents, establishing routing as a critical design variable in multi-agent systems.

$TCA

AINeutralarXiv – CS AI · Apr 156/10

🧠

Designing Reliable LLM-Assisted Rubric Scoring for Constructed Responses: Evidence from Physics Exams

Researchers evaluated GPT-4o's ability to score physics exam responses using rubric-assisted scoring, finding that AI reliability matches human inter-rater consistency when rubrics are well-structured and granular. The study reveals that clear rubric design matters far more than LLM configuration choices, with performance declining on ambiguous mid-range responses.

🧠 GPT-4

AIBullisharXiv – CS AI · Apr 156/10

🧠

HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models

Researchers introduce HintMR, a hint-assisted reasoning framework that improves mathematical problem-solving in small language models by using a separate hint-generating model to provide contextual guidance through multi-step problems. This collaborative two-model system demonstrates significant accuracy improvements over standard prompting while maintaining computational efficiency.

AINeutralarXiv – CS AI · Apr 156/10

🧠

How memory can affect collective and cooperative behaviors in an LLM-Based Social Particle Swarm

Researchers demonstrated that memory length in LLM-based multi-agent systems produces contradictory effects on cooperation depending on the model used: Gemini showed suppressed cooperation with longer memory, while Gemma exhibited enhanced cooperation. The findings suggest model-specific characteristics and alignment mechanisms fundamentally shape emergent social behaviors in AI agent systems.

🧠 Gemini

AINeutralarXiv – CS AI · Apr 156/10

🧠

A Scoping Review of Large Language Model-Based Pedagogical Agents

A comprehensive scoping review of 52 studies examines Large Language Model-based pedagogical agents across educational contexts from November 2022 to January 2025. The research identifies four key design dimensions (interaction approach, domain scope, role complexity, system integration) and emerging trends including multi-agent systems, virtual student simulation, and integration with immersive technologies, while flagging critical research gaps around privacy, accuracy, and student autonomy.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models

Researchers propose Heuristic Classification of Thoughts (HCoT), a novel prompting method that integrates expert system heuristics into large language models to improve structured reasoning on complex problems. The approach addresses LLMs' stochastic token generation and decoupled reasoning mechanisms by using heuristic classification to guide and optimize decision trajectories, demonstrating superior performance and token efficiency compared to existing methods like Chain-of-Thoughts and Tree-of-Thoughts prompting.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Operationalising the Right to be Forgotten in LLMs: A Lightweight Sequential Unlearning Framework for Privacy-Aligned Deployment in Politically Sensitive Environments

Researchers introduce a sequential unlearning framework that enables Large Language Models to forget sensitive data while maintaining performance, addressing GDPR compliance and the Right to be Forgotten in politically sensitive deployments. The method stabilizes general capabilities through positive fine-tuning before selectively suppressing designated patterns, demonstrating effectiveness on the SemEval-2025 benchmark with minimal accuracy degradation.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Enhancing Clustering: An Explainable Approach via Filtered Patterns

Researchers propose a pattern reduction framework for explainable clustering that eliminates redundant k-relaxed frequent patterns (k-RFPs) while maintaining cluster quality. The approach uses formal characterization and optimization strategies to reduce computational complexity in knowledge-driven unsupervised learning systems.

AINeutralarXiv – CS AI · Apr 156/10

🧠

DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant

The first LLM Testing competition at ICSE 2026's DeepTest workshop evaluated four tools designed to benchmark an LLM-based automotive assistant, focusing on their ability to identify failure cases where the system fails to surface critical safety warnings from car manuals. The competition assessed both the effectiveness of test discovery and the diversity of identified failures, establishing a benchmark for evaluating AI testing methodologies in safety-critical applications.

AIBullisharXiv – CS AI · Apr 156/10

🧠

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

Researchers introduce KnowRL, a reinforcement learning framework that improves large language model reasoning by using minimal, strategically-selected knowledge points rather than verbose hints. The approach achieves state-of-the-art results on reasoning benchmarks at the 1.5B parameter scale, with the trained model and code made publicly available.

AIBullisharXiv – CS AI · Apr 156/10

🧠

RPRA: Predicting an LLM-Judge for Efficient but Performant Inference

Researchers propose RPRA (Reason-Predict-Reason-Answer/Act), a framework enabling smaller language models to predict how a larger LLM judge would evaluate their outputs before responding. By routing simple queries to smaller models and complex ones to larger models, the approach reduces computational costs while maintaining output quality, with fine-tuned smaller models achieving up to 55% accuracy improvements.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Artificial Intelligence for Modeling and Simulation of Mixed Automated and Human Traffic

A comprehensive survey examines AI methodologies for simulating mixed autonomous and human-driven traffic, addressing critical gaps in current simulation tools. The research proposes a unified taxonomy of AI methods spanning agent-level behavior models, environment-level simulations, and physics-informed approaches to improve autonomous vehicle testing and validation.

AINeutralarXiv – CS AI · Apr 156/10

🧠

LIFE -- an energy efficient advanced continual learning agentic AI framework for frontier systems

Researchers propose LIFE, an energy-efficient AI framework designed to address the computational demands of high-performance computing systems through continual learning and agentic AI rather than monolithic transformers. The system combines orchestration, context engineering, memory management, and lattice learning to enable self-evolving network operations, demonstrated through HPC latency spike detection and mitigation.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Modeling Co-Pilots for Text-to-Model Translation

Researchers introduce Text2Model and Text2Zinc, frameworks that use large language models to translate natural language descriptions into formal optimization and satisfaction models. The work represents the first unified approach combining both problem types with a solver-agnostic architecture, though experiments reveal LLMs remain imperfect at this task despite showing competitive performance.

← PrevPage 139 of 509Next →