🧠

AI

20,983 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

20983 articles

AINeutralarXiv – CS AI · Apr 136/10

🧠

Dejavu: Towards Experience Feedback Learning for Embodied Intelligence

Researchers introduce Dejavu, a post-deployment learning framework that enables frozen Vision-Language-Action policies to improve through experience retrieval and feedback networks. The system allows embodied AI agents to continuously learn from past trajectories without retraining, improving task performance across diverse robotic applications.

AIBullisharXiv – CS AI · Apr 136/10

🧠

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Researchers introduce Sequence-Level PPO (SPPO), a new algorithm that improves how large language models are trained for reasoning tasks by addressing stability and computational efficiency issues in standard reinforcement learning approaches. SPPO matches the performance of resource-heavy methods while significantly reducing memory and computational costs, potentially accelerating LLM alignment for complex problem-solving.

AINeutralarXiv – CS AI · Apr 136/10

🧠

StaRPO: Stability-Augmented Reinforcement Policy Optimization

Researchers propose StaRPO, a reinforcement learning framework that improves large language model reasoning by incorporating stability metrics alongside task rewards. The method uses Autocorrelation Function and Path Efficiency measurements to evaluate logical coherence and goal-directedness, demonstrating improved accuracy and reasoning consistency across four benchmarks.

AINeutralarXiv – CS AI · Apr 136/10

🧠

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Researchers introduce AV-SpeakerBench, a new 3,212-question benchmark designed to evaluate how well multimodal large language models understand audiovisual speech by correlating speakers with their dialogue and timing. Testing reveals Gemini 2.5 Pro significantly outperforms open-source competitors, with the gap primarily attributable to inferior audiovisual fusion capabilities rather than visual perception limitations.

🧠 Gemini

AIBearisharXiv – CS AI · Apr 136/10

🧠

Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

Researchers found that large language models fail to accurately simulate human susceptibility to misinformation, consistently overstating how attitudes drive belief and sharing while ignoring social network effects. The study reveals systematic biases in how LLMs represent misinformation concepts, suggesting they are better tools for identifying where AI diverges from human judgment rather than replacing human survey responses.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Artifacts as Memory Beyond the Agent Boundary

Researchers formalize how agents can use environmental artifacts as external memory to reduce computational requirements in reinforcement learning tasks. The study demonstrates that spatial observations can implicitly serve as memory substitutes, allowing agents to learn effective policies with less internal memory capacity than previously thought necessary.

AIBearisharXiv – CS AI · Apr 136/10

🧠

Adversarial Evasion Attacks on Computer Vision using SHAP Values

Researchers demonstrate a white-box adversarial attack on computer vision models using SHAP values to identify and exploit critical input features, showing superior robustness compared to the Fast Gradient Sign Method, particularly when gradient information is obscured or hidden.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Model Space Reasoning as Search in Feedback Space for Planning Domain Generation

Researchers present a novel approach using agentic language model feedback frameworks to generate planning domains from natural language descriptions augmented with symbolic information. The method employs heuristic search over model space optimized by various feedback mechanisms, including landmarks and plan validator outputs, to improve domain quality for practical deployment.

AIBearisharXiv – CS AI · Apr 136/10

🧠

How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison

Researchers conducted a large-scale computational analysis comparing 17,790 articles from Grokipedia, Elon Musk's AI-generated encyclopedia, against Wikipedia. The study found that Grokipedia articles are longer but contain fewer citations, with some entries showing systematic rightward political bias in media sources, particularly in history, religion, and arts sections.

🏢 xAI🧠 Grok

AIBullisharXiv – CS AI · Apr 136/10

🧠

VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning

Researchers introduce VISOR, a new agentic visual retrieval-augmented generation system that improves how AI models reason over multi-page visual documents. By addressing key technical challenges in evidence gathering and context management, VISOR achieves state-of-the-art results on complex visual reasoning tasks.

AIBullisharXiv – CS AI · Apr 136/10

🧠

Constraining Sequential Model Editing with Editing Anchor Compression

Researchers propose Editing Anchor Compression (EAC), a framework that addresses degradation of large language models' general abilities during sequential knowledge editing. By constraining parameter matrix deviations through selective anchor compression, EAC preserves over 70% of model performance while maintaining edited knowledge, advancing the practical viability of model editing as an alternative to expensive retraining.

AINeutralarXiv – CS AI · Apr 136/10

🧠

AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society

Researchers introduce AgentSociety, a large-scale simulator using LLM-driven agents to study human behavior and social dynamics. The system simulates over 10,000 agents and 5 million interactions to model real-world social phenomena including polarization, policy impacts, and urban sustainability, demonstrating alignment with actual experimental results.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization

Researchers introduced NLCO, a benchmark for evaluating large language models on natural-language combinatorial optimization problems without external solvers or code generation. Testing across modern LLMs reveals that while high-performing models handle small instances well, performance degrades significantly as problem complexity increases, with graph-structured and bottleneck-objective problems proving particularly challenging.

AINeutralarXiv – CS AI · Apr 136/10

🧠

ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences

Researchers introduce ReplicatorBench, a comprehensive benchmark for evaluating AI agents' ability to replicate scientific research claims in social and behavioral sciences. The study reveals that current LLM agents excel at designing and executing experiments but struggle significantly with data retrieval, highlighting critical gaps in autonomous research validation capabilities.

AINeutralarXiv – CS AI · Apr 136/10

🧠

TRU: Targeted Reverse Update for Efficient Multimodal Recommendation Unlearning

Researchers propose TRU (Targeted Reverse Update), a machine unlearning framework designed to efficiently remove user data from multimodal recommendation systems without full retraining. The method addresses non-uniform data influence across ranking behavior, modality branches, and network layers through coordinated interventions, achieving better performance than existing approximate unlearning approaches.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

Researchers have developed RandSymKL, a debiasing technique for Bangla language models that mitigates gender bias in classification tasks like sentiment analysis and hate speech detection. The study introduces four manually annotated benchmark datasets with gender-perturbation testing and demonstrates that the approach effectively reduces bias while maintaining competitive accuracy compared to existing methods.

AINeutralarXiv – CS AI · Apr 136/10

🧠

ASPECT:Analogical Semantic Policy Execution via Language Conditioned Transfer

Researchers introduce ASPECT, a novel reinforcement learning framework that uses large language models as semantic operators to enable zero-shot transfer learning across novel tasks. By conditioning a text-based VAE on LLM-generated task descriptions, the approach allows agents to reuse policies on structurally similar but previously unseen tasks without discrete category constraints.

AINeutralarXiv – CS AI · Apr 136/10

🧠

OmniPrism: Learning Disentangled Visual Concept for Image Generation

OmniPrism introduces a new visual concept disentanglement approach for AI image generation that separates multiple visual aspects (content, style, composition) to enable more controlled and creative outputs. The method uses a contrastive training pipeline and a new 200K paired dataset to train diffusion models that can incorporate disentangled concepts while maintaining fidelity to text prompts.

AIBullisharXiv – CS AI · Apr 136/10

🧠

Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search

Researchers introduce Chain-in-Tree (CiT), a framework that optimizes large language model tree search by selectively branching only when necessary rather than at every step. The approach reduces computational overhead by 75-85% on math reasoning tasks with minimal accuracy loss, making inference-time scaling more practical for resource-constrained deployments.

AIBullisharXiv – CS AI · Apr 136/10

🧠

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

Researchers introduce VisionFoundry, a synthetic data generation pipeline that uses LLMs and text-to-image models to create targeted training data for vision-language models. The approach addresses VLMs' weakness in visual perception tasks and demonstrates 7-10% improvements on benchmark tests without requiring human annotation or reference images.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise

Researchers introduce VisPrompt, a framework that improves prompt learning for vision-language models by injecting visual semantic information to enhance robustness against label noise. The approach keeps pre-trained models frozen while adding minimal trainable parameters, demonstrating superior performance across seven benchmark datasets under both synthetic and real-world noisy conditions.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos

Researchers provide the first rigorous theoretical analysis of OPTQ (GPTQ), a widely-used post-training quantization algorithm for neural networks and LLMs, establishing quantitative error bounds and validating practical design choices. The study extends theoretical guarantees to both deterministic and stochastic variants of OPTQ and the Qronos algorithm, offering guidance for regularization parameter selection and quantization alphabet sizing.

AIBullisharXiv – CS AI · Apr 136/10

🧠

Sample-Efficient Neurosymbolic Deep Reinforcement Learning

Researchers propose a neuro-symbolic deep reinforcement learning approach that integrates logical rules and symbolic knowledge to improve sample efficiency and generalization in RL systems. The method transfers partial policies from simple tasks to complex ones, reducing training data requirements and improving performance in sparse-reward environments compared to existing baselines.

AINeutralThe Register – AI · Apr 136/10

🧠

China wants AI to prepare school lessons and mark homework

China is promoting AI integration into education systems to automate lesson preparation and homework grading. This policy reflects Beijing's broader AI strategy to embed artificial intelligence across public services while addressing teacher shortages and education quality gaps.

AINeutralThe Register – AI · Apr 136/10

🧠

Linux 7.0 debuts as Linus Torvalds ponders AI's bug-finding powers and their impact on release process

Linux 7.0 has been released as Linus Torvalds explores how AI could enhance bug detection and streamline the kernel development process. The milestone reflects the open-source community's growing interest in leveraging AI tools to improve software quality and development workflows.

← PrevPage 467 of 840Next →