🧠

AI

21,014 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21014 articles

AINeutralarXiv – CS AI · Apr 106/10

🧠

Neural Computers

Researchers propose Neural Computers (NCs), a new computing paradigm where AI models function as executable runtime environments rather than static predictors. The work demonstrates early NC prototypes using video models that process instructions and user actions to generate screen frames, establishing foundational I/O primitives while identifying significant challenges toward achieving general-purpose Completely Neural Computers (CNCs).

AINeutralarXiv – CS AI · Apr 106/10

🧠

Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries

Researchers evaluated whether large language models understand long-form narratives similarly to humans by comparing summaries of 150 novels written by humans and nine state-of-the-art LLMs. The study found that LLMs focus disproportionately on story endings rather than distributing attention like human readers, revealing gaps in narrative comprehension despite expanded context windows.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Say Something Else: Rethinking Contextual Privacy as Information Sufficiency

Researchers formalize privacy-preserving communication for LLM agents by introducing Information Sufficiency (IS) as a framework and proposing free-text pseudonymization as a third privacy strategy alongside suppression and generalization. Evaluation across 792 scenarios reveals that pseudonymization offers superior privacy-utility tradeoffs, and that multi-turn conversational testing exposes significant privacy leakage missed by single-message assessments.

AINeutralarXiv – CS AI · Apr 106/10

🧠

"Don't Be Afraid, Just Learn": Insights from Industry Practitioners to Prepare Software Engineers in the Age of Generative AI

A study of 51 industry practitioners reveals that generative AI integration into software development has created a significant gap between university curricula and industry hiring expectations. The research identifies new required skills like prompting and output evaluation, while emphasizing that soft skills and traditional competencies remain critical for modern software engineers.

AIBullisharXiv – CS AI · Apr 106/10

🧠

MAT-Cell: A Multi-Agent Tree-Structured Reasoning Framework for Batch-Level Single-Cell Annotation

Researchers introduce MAT-Cell, a neuro-symbolic AI framework that combines large language models with biological constraints to improve single-cell annotation accuracy. The system uses multi-agent reasoning and verification processes to overcome limitations in both supervised learning and LLM-based approaches, demonstrating superior performance on cross-species benchmarks.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Attribution-Driven Explainable Intrusion Detection with Encoder-Based Large Language Models

Researchers propose an attribution-driven approach to make encoder-based Large Language Models more transparent and trustworthy for network intrusion detection in Software-Defined Networks. By analyzing which traffic features drive model decisions, the study demonstrates that LLMs learn legitimate attack behavior patterns, addressing a critical barrier to deploying AI security tools in sensitive environments.

AINeutralarXiv – CS AI · Apr 106/10

🧠

In-Context Learning in Speech Language Models: Analyzing the Role of Acoustic Features, Linguistic Structure, and Induction Heads

Researchers investigate in-context learning (ICL) in speech language models, revealing that speaking rate significantly affects model performance and acoustic mimicry, while induction heads play a causal role identical to text-based ICL. The study bridges the gap between text and speech domains by analyzing how models learn from demonstrations in text-to-speech tasks.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Incentive-Aware Multi-Fidelity Optimization for Generative Advertising in Large Language Models

Researchers propose IAMFM, a framework that combines game-theoretic incentives with optimization algorithms to improve how ads are placed in LLM-generated content while controlling computational costs. The approach guarantees strategic advertisers behave honestly and introduces a novel "warm-start" method for efficient payment calculations in complex ad auctions.

AIBullisharXiv – CS AI · Apr 106/10

🧠

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

Researchers introduce S³ (Stratified Scaling Search), a test-time scaling method for diffusion language models that improves output quality by reallocating compute during the denoising process rather than simple best-of-K sampling. The technique uses a lightweight verifier to evaluate and selectively resample candidate trajectories at each step, demonstrating consistent performance gains across mathematical reasoning and knowledge tasks without requiring model retraining.

AINeutralarXiv – CS AI · Apr 106/10

🧠

DISSECT: Diagnosing Where Vision Ends and Language Priors Begin in Scientific VLMs

Researchers introduce DISSECT, a 12,000-question diagnostic benchmark that reveals a critical "perception-integration gap" in Vision-Language Models—where VLMs successfully extract visual information but fail to reason about it during downstream tasks. Testing 18 VLMs across Chemistry and Biology shows open-source models systematically struggle with integrating visual input into reasoning, while closed-source models demonstrate superior integration capabilities.

AIBullisharXiv – CS AI · Apr 106/10

🧠

FLeX: Fourier-based Low-rank EXpansion for multilingual transfer

Researchers propose FLeX, a parameter-efficient fine-tuning approach combining LoRA, advanced optimizers, and Fourier-based regularization to enable cross-lingual code generation across programming languages. The method achieves 42.1% pass@1 on Java tasks compared to a 34.2% baseline, demonstrating significant improvements in multilingual transfer without full model retraining.

🧠 Llama

AINeutralarXiv – CS AI · Apr 106/10

🧠

Illocutionary Explanation Planning for Source-Faithful Explanations in Retrieval-Augmented Language Models

Researchers introduce chain-of-illocution (CoI) prompting to improve source faithfulness in retrieval-augmented language models, achieving up to 63% gains in source adherence for programming education tasks. The study reveals that standard RAG systems exhibit low fidelity to source materials, with non-RAG models performing worse, while a user study confirms improved faithfulness does not compromise user satisfaction.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Governing frontier general-purpose AI in the public sector: adaptive risk management and policy capacity under uncertainty through 2030

A research paper proposes adaptive risk management frameworks for governing frontier AI in public sectors through 2030, arguing that static compliance models are insufficient given rapid capability advancement and incomplete knowledge of AI harms. The work emphasizes that effective governance requires organizational redesign, stronger policy capacity, and scenario-aware regulation rather than purely technical solutions.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Negotiating Privacy with Smart Voice Assistants: Risk-Benefit and Control-Acceptance Tensions

Researchers studying 469 Canadian youth aged 16-24 developed a negotiation-based framework to understand privacy decision-making with smart voice assistants, introducing two tension indices (RBTI and CATI) that measure competing risk-benefit and control-acceptance pressures. The study reveals that frequent SVA users exhibit benefit-dominant profiles and accept convenience trade-offs, suggesting the privacy paradox reflects negotiation rather than inconsistency.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook

Researchers introduce DOVE, a distributional evaluation framework that measures how well large language models align with cultural values through open-ended text generation rather than multiple-choice tests. The framework uses rate-distortion optimization to create a value codebook and unbalanced optimal transport to assess alignment, demonstrating 31.56% correlation with downstream tasks across 12 LLMs while requiring only 500 samples per culture.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Beyond Facts: Benchmarking Distributional Reading Comprehension in Large Language Models

Researchers introduce Text2DistBench, a new benchmark for evaluating how well large language models understand distributional information—like trends and preferences across text collections—rather than just factual details. Built from YouTube comments about movies and music, the benchmark reveals that while LLMs outperform random baselines, their performance varies significantly across different distribution types, highlighting both capabilities and gaps in current AI systems.

AIBearisharXiv – CS AI · Apr 106/10

🧠

Robustness Risk of Conversational Retrieval: Identifying and Mitigating Noise Sensitivity in Qwen3-Embedding Model

Researchers identified a critical robustness vulnerability in Qwen3-embedding models for conversational retrieval, where structured dialogue noise becomes disproportionately retrievable and contaminates search results. The problem remains invisible under standard benchmarks but is significantly more pronounced in Qwen3 than competing models, though lightweight query prompting effectively mitigates it.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Fighting AI with AI: AI-Agent Augmented DNS Blocking of LLM Services during Student Evaluations

Researchers introduce AI-Sinkhole, an AI-agent augmented DNS-blocking framework that dynamically detects and temporarily blocks LLM chatbot services during proctored exams to prevent academic integrity violations. The system uses quantized LLMs for semantic classification and Pi-Hole for network-wide DNS blocking, achieving robust cross-lingual detection with F1-scores exceeding 0.83.

AINeutralarXiv – CS AI · Apr 106/10

🧠

A-MBER: Affective Memory Benchmark for Emotion Recognition

Researchers introduce A-MBER, a benchmark dataset designed to evaluate AI assistants' ability to recognize emotions based on long-term interaction history rather than immediate context. The benchmark tests whether models can retrieve relevant past interactions, infer current emotional states, and provide grounded explanations—revealing that memory's value lies in selective, context-aware interpretation rather than simple historical volume.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization

Researchers propose T-STAR, a novel reinforcement learning framework that structures multi-step agent trajectories as trees rather than independent chains, enabling better credit assignment for LLM agents. The method uses tree-based reward propagation and surgical policy optimization to improve reasoning performance across embodied, interactive, and planning tasks.

AINeutralarXiv – CS AI · Apr 106/10

🧠

CAFP: A Post-Processing Framework for Group Fairness via Counterfactual Model Averaging

Researchers introduce CAFP, a post-processing framework that mitigates algorithmic bias by averaging predictions across factual and counterfactual versions of inputs where sensitive attributes are flipped. The model-agnostic approach eliminates the need for retraining or architectural modifications, making fairness interventions practical for deployed systems in high-stakes domains like credit scoring and criminal justice.

🏢 Meta

AINeutralarXiv – CS AI · Apr 105/10

🧠

Full State-Space Visualisation of the 8-Puzzle: Feasibility, Design, and Educational Use

Researchers have developed an interactive visualization system that displays the complete 181,440-state space of the 8-puzzle problem using GPU-based rendering, enabling students to explore search algorithm behavior in real-time. The system demonstrates that full state-space visualization is technically feasible and educationally valuable for AI education, bridging abstract algorithmic concepts with concrete puzzle manipulation.

AINeutralarXiv – CS AI · Apr 106/10

🧠

How Much LLM Does a Self-Revising Agent Actually Need?

Researchers introduce a declarative runtime protocol that externalizes agent state to measure how much of an LLM-based agent's competence actually derives from the language model versus explicit structural components. Testing on Collaborative Battleship, they find that explicit world-model planning drives most performance gains, while sparse LLM-based revision at 4.3% of turns yields minimal and sometimes negative returns.

AIBullisharXiv – CS AI · Apr 106/10

🧠

EmoMAS: Emotion-Aware Multi-Agent System for High-Stakes Edge-Deployable Negotiation with Bayesian Orchestration

Researchers introduce EmoMAS, a Bayesian multi-agent framework that enables small language models to perform sophisticated negotiation by treating emotional intelligence as a strategic variable. The system coordinates game-theoretic, reinforcement learning, and psychological agents to optimize negotiation outcomes while maintaining privacy through edge deployment, demonstrating performance comparable to larger models across high-stakes domains.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Consistency-Guided Decoding with Proof-Driven Disambiguation for Three-Way Logical Question Answering

Researchers present CGD-PD, a test-time decoding method that improves large language models' performance on three-way logical question answering (True/False/Unknown) by enforcing negation consistency and resolving epistemic uncertainty through targeted entailment probes. The approach achieves up to 16% relative accuracy improvements on the FOLIO benchmark while reducing spurious Unknown predictions.

← PrevPage 474 of 841Next →