🧠

AI

12,721 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12721 articles

AINeutralarXiv – CS AI · Apr 136/10

🧠

3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding

Researchers introduce 3D-VCD, an inference-time framework that reduces hallucinations in 3D-LLM embodied agents by contrasting predictions against distorted scene graphs. The method addresses failures specific to 3D spatial reasoning without requiring model retraining, advancing reliability in embodied AI systems.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition

Researchers introduce MATU, a novel uncertainty quantification framework using tensor decomposition to address reliability challenges in Large Language Model-based Multi-Agent Systems. The method analyzes entire reasoning trajectories rather than single outputs, effectively measuring uncertainty across different agent structures and communication topologies.

AINeutralarXiv – CS AI · Apr 136/10

🧠

LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs

A new study comparing large language models against graph-based parsers for relation extraction demonstrates that smaller, specialized architectures significantly outperform LLMs when processing complex linguistic graphs with multiple relations. This finding challenges the prevailing assumption that larger language models are universally superior for natural language processing tasks.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models

Researchers benchmarked five frontier LLMs against human players in Cards Against Humanity games, finding that while models exceed random baseline performance, their humor preferences align poorly with humans but strongly with each other. The findings suggest LLM humor judgment may reflect systematic biases and structural artifacts rather than genuine preference understanding.

AIBearisharXiv – CS AI · Apr 136/10

🧠

Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation

Researchers evaluated how well frontier LLMs like GPT-4o and Gemini interpret story morals across 14 language-culture pairs, finding that while models generate semantically similar outputs to humans, they lack cultural diversity and concentrate on universally shared values rather than culturally-specific moral interpretations.

🧠 GPT-4🧠 Gemini

AINeutralarXiv – CS AI · Apr 136/10

🧠

Building Better Environments for Autonomous Cyber Defence

Workshop participants from academia, industry, and government convened in November 2025 to establish best practices for designing reinforcement learning environments in autonomous cyber defence. The resulting framework and guidelines address a critical gap in documented knowledge about RL environment development for network security applications, including critical infrastructure protection.

AIBullisharXiv – CS AI · Apr 136/10

🧠

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

Researchers demonstrate that HiFloat4, a 4-bit floating-point format, enables efficient large language model training on Huawei's Ascend NPUs with up to 4x improvements in compute throughput and memory efficiency. The study shows that specialized stabilization techniques can maintain accuracy within 1% of full-precision baselines while preserving computational gains across dense and mixture-of-experts architectures.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs

Researchers introduce Dictionary-Aligned Concept Control (DACO), a framework that uses a curated dictionary of 15,000 multimodal concepts and Sparse Autoencoders to improve safety in multimodal large language models by steering their activations at inference time. Testing across multiple models shows DACO significantly enhances safety performance while preserving general-purpose capabilities without requiring model retraining.

AINeutralarXiv – CS AI · Apr 136/10

🧠

AI-Induced Human Responsibility (AIHR) in AI-Human teams

A research study reveals that people assign significantly more responsibility to human decision-makers when they work alongside AI systems compared to human teammates, even in scenarios involving moral harm. This 'AI-Induced Human Responsibility' (AIHR) effect stems from perceiving AI as a constrained tool rather than an autonomous agent, raising important questions about accountability structures in AI-augmented organizations.

$MKR

AINeutralarXiv – CS AI · Apr 136/10

🧠

Beyond Relevance: Utility-Centric Retrieval in the LLM Era

A research paper proposes a fundamental shift in how retrieval systems are evaluated, moving from traditional relevance-based metrics toward utility-centric optimization for large language models. This framework argues that retrieval effectiveness should be measured by its contribution to LLM-generated answer quality rather than document ranking alone, reflecting the structural changes introduced by retrieval-augmented generation (RAG) systems.

AINeutralarXiv – CS AI · Apr 136/10

🧠

WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning

Researchers introduce WOMBET, a framework that improves reinforcement learning efficiency in robotics by generating synthetic training data from a world model in source tasks and selectively transferring it to target tasks. The approach combines offline-to-online learning with uncertainty-aware planning to reduce data collection costs while maintaining robustness.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models

Researchers introduce Litmus (Re)Agent, an agentic system that predicts how multilingual AI models will perform on tasks lacking direct benchmark data. Using a controlled benchmark of 1,500 questions across six tasks, the system decomposes queries into hypotheses and synthesizes predictions through structured reasoning, outperforming competing approaches particularly when direct evidence is sparse.

AINeutralarXiv – CS AI · Apr 136/10

🧠

PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

Researchers introduce PerMix-RLVR, a training method that enables large language models to maintain persona flexibility while preserving task robustness. The approach addresses a fundamental trade-off in reinforcement learning with verifiable rewards, where models become less responsive to persona prompts but gain improved performance on objective tasks.

AINeutralarXiv – CS AI · Apr 136/10

🧠

ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering

Researchers introduce ASTRA, a new architecture designed to improve how large language models process and reason about complex tables through adaptive semantic tree structures. The method combines tree-based navigation with symbolic code execution to achieve state-of-the-art performance on table question-answering benchmarks, addressing fundamental limitations in how tables are currently serialized for LLMs.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Noise-Aware In-Context Learning for Hallucination Mitigation in ALLMs

Researchers propose Noise-Aware In-Context Learning (NAICL), a plug-and-play method to reduce hallucinations in auditory large language models without expensive fine-tuning. The approach uses a noise prior library to guide models toward more conservative outputs, achieving a 37% reduction in hallucination rates while establishing a new benchmark for evaluating audio understanding systems.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection

Researchers introduce ImageProtector, a user-side defense mechanism that embeds imperceptible perturbations into images to prevent multi-modal large language models from analyzing them. When adversaries attempt to extract sensitive information from protected images, MLLMs are induced to refuse analysis, though potential countermeasures exist that may partially mitigate the technique's effectiveness.

AINeutralarXiv – CS AI · Apr 136/10

🧠

CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space

Researchers introduce CONDESION-BENCH, a new benchmark for evaluating how large language models make decisions in complex, real-world scenarios with compositional actions and conditional constraints. The benchmark addresses limitations in existing decision-making frameworks by incorporating variable-level, contextual, and allocation-level restrictions that better reflect actual decision-making environments.

AIBullisharXiv – CS AI · Apr 136/10

🧠

Learning Vision-Language-Action World Models for Autonomous Driving

Researchers present VLA-World, a vision-language-action model that combines predictive world modeling with reflective reasoning for autonomous driving. The system generates future frames guided by action trajectories and then reasons over imagined scenarios to refine predictions, achieving state-of-the-art performance on planning and future-generation benchmarks.

AINeutralarXiv – CS AI · Apr 136/10

🧠

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Researchers introduce CLIP-Inspector, a backdoor detection method for prompt-tuned CLIP models that reconstructs hidden triggers using out-of-distribution images to identify if a model has been maliciously compromised. The technique achieves 94% detection accuracy and enables post-hoc model repair, addressing critical security vulnerabilities in outsourced machine learning services.

AIBullisharXiv – CS AI · Apr 136/10

🧠

Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition

Researchers propose Interactive ASR, a new framework that combines semantic-aware evaluation using LLM-as-a-Judge with multi-turn interactive correction to improve automatic speech recognition beyond traditional word error rate metrics. The approach simulates human-like interaction, enabling iterative refinement of recognition outputs across English, Chinese, and code-switching datasets.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Structuring versus Problematizing: How LLM-based Agents Scaffold Learning in Diagnostic Reasoning

Researchers developed PharmaSim Switch, an AI-powered educational platform that uses large language models to scaffold diagnostic reasoning in pharmacy technician training through two distinct pedagogical approaches: structuring and problematizing. A 63-student experiment found both methods effective, with structuring promoting more accurate participation and problematizing encouraging deeper constructive engagement, suggesting hybrid scaffolding strategies optimize learning outcomes.

AIBearisharXiv – CS AI · Apr 136/10

🧠

GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking

Researchers introduce GRM, a frequency-selective jailbreak framework that exploits vulnerabilities in audio large language models while maintaining utility preservation. By strategically perturbing specific frequency bands rather than entire spectrums, GRM achieves 88.46% jailbreak success rates with better trade-offs between attack effectiveness and transcription quality compared to existing methods.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Visually-Guided Policy Optimization for Multimodal Reasoning

Researchers propose Visually-Guided Policy Optimization (VGPO), a framework that enhances vision-language models' ability to focus on visual information during reasoning tasks. The method addresses a fundamental limitation where text-dominated VLMs suffer from weak visual attention and temporal visual forgetting, improving performance on multimodal reasoning and visual-dependent tasks.

AIBullisharXiv – CS AI · Apr 136/10

🧠

The AI Codebase Maturity Model: From Assisted Coding to Self-Sustaining Systems

Researchers present the AI Codebase Maturity Model (ACMM), a 5-level framework for systematically evolving codebases from basic AI-assisted coding to self-sustaining systems. Validated through a 4-month case study of KubeStellar Console, the model demonstrates that AI system intelligence depends primarily on surrounding infrastructure—testing, metrics, and feedback loops—rather than the AI model itself.

🏢 Microsoft🧠 Claude🧠 Copilot

AINeutralarXiv – CS AI · Apr 136/10

🧠

Yes, But Not Always. Generative AI Needs Nuanced Opt-in

A research paper proposes that generative AI licensing requires nuanced, conditional consent rather than binary opt-in/opt-out frameworks. The study argues inference-time verification can better balance rights holders' interests with AI developers' capabilities, using music licensing as a practical case study to demonstrate how contextual consent conditions can be enforced.

← PrevPage 153 of 509Next →