🧠

AI

12,721 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12721 articles

AIBullisharXiv – CS AI · Apr 136/10

🧠

RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval

Researchers introduce RecaLLM, a post-trained language model that addresses the 'lost-in-thought' phenomenon where retrieval performance degrades during extended reasoning chains. The model interleaves explicit in-context retrieval with reasoning steps and achieves strong performance on long-context benchmarks using training data significantly shorter than existing approaches.

AIBullisharXiv – CS AI · Apr 136/10

🧠

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

Researchers introduce BERT-as-a-Judge, a lightweight alternative to LLM-based evaluation methods that assesses generative model outputs with greater accuracy than lexical approaches while requiring significantly less computational overhead. The method demonstrates that existing lexical evaluation techniques poorly correlate with human judgment across 36 models and 15 tasks, establishing a practical middle ground between rigid rule-based and expensive LLM-judge evaluation paradigms.

AIBullisharXiv – CS AI · Apr 136/10

🧠

VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning

Researchers introduce VISOR, a new agentic visual retrieval-augmented generation system that improves how AI models reason over multi-page visual documents. By addressing key technical challenges in evidence gathering and context management, VISOR achieves state-of-the-art results on complex visual reasoning tasks.

AIBullisharXiv – CS AI · Apr 136/10

🧠

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

Researchers introduce VisionFoundry, a synthetic data generation pipeline that uses LLMs and text-to-image models to create targeted training data for vision-language models. The approach addresses VLMs' weakness in visual perception tasks and demonstrates 7-10% improvements on benchmark tests without requiring human annotation or reference images.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise

Researchers introduce VisPrompt, a framework that improves prompt learning for vision-language models by injecting visual semantic information to enhance robustness against label noise. The approach keeps pre-trained models frozen while adding minimal trainable parameters, demonstrating superior performance across seven benchmark datasets under both synthetic and real-world noisy conditions.

AIBullisharXiv – CS AI · Apr 136/10

🧠

Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search

Researchers introduce Chain-in-Tree (CiT), a framework that optimizes large language model tree search by selectively branching only when necessary rather than at every step. The approach reduces computational overhead by 75-85% on math reasoning tasks with minimal accuracy loss, making inference-time scaling more practical for resource-constrained deployments.

AIBullisharXiv – CS AI · Apr 136/10

🧠

Sample-Efficient Neurosymbolic Deep Reinforcement Learning

Researchers propose a neuro-symbolic deep reinforcement learning approach that integrates logical rules and symbolic knowledge to improve sample efficiency and generalization in RL systems. The method transfers partial policies from simple tasks to complex ones, reducing training data requirements and improving performance in sparse-reward environments compared to existing baselines.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization

Researchers introduced NLCO, a benchmark for evaluating large language models on natural-language combinatorial optimization problems without external solvers or code generation. Testing across modern LLMs reveals that while high-performing models handle small instances well, performance degrades significantly as problem complexity increases, with graph-structured and bottleneck-objective problems proving particularly challenging.

AINeutralarXiv – CS AI · Apr 136/10

🧠

ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences

Researchers introduce ReplicatorBench, a comprehensive benchmark for evaluating AI agents' ability to replicate scientific research claims in social and behavioral sciences. The study reveals that current LLM agents excel at designing and executing experiments but struggle significantly with data retrieval, highlighting critical gaps in autonomous research validation capabilities.

AINeutralarXiv – CS AI · Apr 136/10

🧠

TRU: Targeted Reverse Update for Efficient Multimodal Recommendation Unlearning

Researchers propose TRU (Targeted Reverse Update), a machine unlearning framework designed to efficiently remove user data from multimodal recommendation systems without full retraining. The method addresses non-uniform data influence across ranking behavior, modality branches, and network layers through coordinated interventions, achieving better performance than existing approximate unlearning approaches.

AINeutralarXiv – CS AI · Apr 136/10

🧠

ASPECT:Analogical Semantic Policy Execution via Language Conditioned Transfer

Researchers introduce ASPECT, a novel reinforcement learning framework that uses large language models as semantic operators to enable zero-shot transfer learning across novel tasks. By conditioning a text-based VAE on LLM-generated task descriptions, the approach allows agents to reuse policies on structurally similar but previously unseen tasks without discrete category constraints.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

Researchers have developed RandSymKL, a debiasing technique for Bangla language models that mitigates gender bias in classification tasks like sentiment analysis and hate speech detection. The study introduces four manually annotated benchmark datasets with gender-perturbation testing and demonstrates that the approach effectively reduces bias while maintaining competitive accuracy compared to existing methods.

AINeutralarXiv – CS AI · Apr 136/10

🧠

OmniPrism: Learning Disentangled Visual Concept for Image Generation

OmniPrism introduces a new visual concept disentanglement approach for AI image generation that separates multiple visual aspects (content, style, composition) to enable more controlled and creative outputs. The method uses a contrastive training pipeline and a new 200K paired dataset to train diffusion models that can incorporate disentangled concepts while maintaining fidelity to text prompts.

AINeutralarXiv – CS AI · Apr 136/10

🧠

AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society

Researchers introduce AgentSociety, a large-scale simulator using LLM-driven agents to study human behavior and social dynamics. The system simulates over 10,000 agents and 5 million interactions to model real-world social phenomena including polarization, policy impacts, and urban sustainability, demonstrating alignment with actual experimental results.

AIBullisharXiv – CS AI · Apr 136/10

🧠

Constraining Sequential Model Editing with Editing Anchor Compression

Researchers propose Editing Anchor Compression (EAC), a framework that addresses degradation of large language models' general abilities during sequential knowledge editing. By constraining parameter matrix deviations through selective anchor compression, EAC preserves over 70% of model performance while maintaining edited knowledge, advancing the practical viability of model editing as an alternative to expensive retraining.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos

Researchers provide the first rigorous theoretical analysis of OPTQ (GPTQ), a widely-used post-training quantization algorithm for neural networks and LLMs, establishing quantitative error bounds and validating practical design choices. The study extends theoretical guarantees to both deterministic and stochastic variants of OPTQ and the Qronos algorithm, offering guidance for regularization parameter selection and quantization alphabet sizing.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Investigating Multimodal Large Language Models to Support Usability Evaluation

Researchers investigate how multimodal large language models (MLLMs) can assist with usability evaluation of user interfaces by analyzing text and visual context together. The study compares MLLM-generated assessments against expert evaluations, finding that these models can effectively prioritize usability issues by severity and offer complementary insights to traditional resource-intensive evaluation methods.

AIBullisharXiv – CS AI · Apr 136/10

🧠

AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting

Researchers propose AR-KAN, a neural network combining autoregressive models with Kolmogorov-Arnold Networks for improved time series forecasting. The model addresses limitations of traditional deep learning approaches by integrating temporal memory preservation with nonlinear function approximation, demonstrating superior performance on both synthetic and real-world datasets.

AINeutralarXiv – CS AI · Apr 136/10

🧠

On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs

Researchers introduce CoA-LoRA, a method that dynamically adapts LoRA fine-tuning to different quantization configurations without requiring separate retraining for each setting. The approach uses a configuration-aware model and Pareto-based search to optimize low-rank adjustments across heterogeneous edge devices, achieving comparable performance to traditional methods with zero additional computational cost.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Dejavu: Towards Experience Feedback Learning for Embodied Intelligence

Researchers introduce Dejavu, a post-deployment learning framework that enables frozen Vision-Language-Action policies to improve through experience retrieval and feedback networks. The system allows embodied AI agents to continuously learn from past trajectories without retraining, improving task performance across diverse robotic applications.

AIBearisharXiv – CS AI · Apr 136/10

🧠

How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison

Researchers conducted a large-scale computational analysis comparing 17,790 articles from Grokipedia, Elon Musk's AI-generated encyclopedia, against Wikipedia. The study found that Grokipedia articles are longer but contain fewer citations, with some entries showing systematic rightward political bias in media sources, particularly in history, religion, and arts sections.

🏢 xAI🧠 Grok

AINeutralarXiv – CS AI · Apr 136/10

🧠

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Researchers introduce AV-SpeakerBench, a new 3,212-question benchmark designed to evaluate how well multimodal large language models understand audiovisual speech by correlating speakers with their dialogue and timing. Testing reveals Gemini 2.5 Pro significantly outperforms open-source competitors, with the gap primarily attributable to inferior audiovisual fusion capabilities rather than visual perception limitations.

🧠 Gemini

AIBearisharXiv – CS AI · Apr 136/10

🧠

Adversarial Evasion Attacks on Computer Vision using SHAP Values

Researchers demonstrate a white-box adversarial attack on computer vision models using SHAP values to identify and exploit critical input features, showing superior robustness compared to the Fast Gradient Sign Method, particularly when gradient information is obscured or hidden.

AIBearisharXiv – CS AI · Apr 136/10

🧠

Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

Researchers found that large language models fail to accurately simulate human susceptibility to misinformation, consistently overstating how attitudes drive belief and sharing while ignoring social network effects. The study reveals systematic biases in how LLMs represent misinformation concepts, suggesting they are better tools for identifying where AI diverges from human judgment rather than replacing human survey responses.

AINeutralThe Register – AI · Apr 136/10

🧠

China wants AI to prepare school lessons and mark homework

China is promoting AI integration into education systems to automate lesson preparation and homework grading. This policy reflects Beijing's broader AI strategy to embed artificial intelligence across public services while addressing teacher shortages and education quality gaps.

← PrevPage 154 of 509Next →