12,730 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce a framework for studying how emotional states affect decision-making in small language models (SLMs) used as autonomous agents. Using activation steering techniques grounded in real-world emotion-eliciting texts, they benchmark SLMs across game-theoretic scenarios and find that emotional perturbations systematically influence strategic choices, though behaviors often remain unstable and misaligned with human patterns.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce Step-Saliency, a diagnostic tool that reveals how large reasoning models fail during multi-step reasoning tasks by identifying two critical information-flow breakdowns: shallow layers that ignore context and deep layers that lose focus on reasoning. They propose StepFlow, a test-time intervention that repairs these flows and improves model accuracy without retraining.
AINeutralarXiv – CS AI · Apr 106/10
🧠AgentGate introduces a lightweight routing engine that optimizes how AI agents communicate and dispatch tasks across distributed systems by treating routing as a constrained decision problem rather than open-ended text generation. The system uses a two-stage approach—action decision and structural grounding—and demonstrates that compact 3B-7B parameter models can achieve competitive performance while operating under resource constraints, latency, and privacy limitations.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers have developed a method to control how verifiable AI hallucinations are in multimodal language models by distinguishing between obvious hallucinations (easily detected by humans) and elusive ones (harder to spot). Using a dataset of 4,470 human responses, they created targeted interventions that can fine-tune which types of hallucinations occur, enabling flexible control suited to different security and usability requirements.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers propose using Inductive Learning of Answer Set Programs (ILASP) to create interpretable approximations of neural networks trained on preference learning tasks. The approach combines dimensionality reduction through Principal Component Analysis with logic-based explanations, addressing the challenge of explaining black-box AI models while maintaining computational efficiency.
AIBullisharXiv – CS AI · Apr 106/10
🧠Researchers introduce EmoMAS, a Bayesian multi-agent framework that enables small language models to perform sophisticated negotiation by treating emotional intelligence as a strategic variable. The system coordinates game-theoretic, reinforcement learning, and psychological agents to optimize negotiation outcomes while maintaining privacy through edge deployment, demonstrating performance comparable to larger models across high-stakes domains.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce CAFP, a post-processing framework that mitigates algorithmic bias by averaging predictions across factual and counterfactual versions of inputs where sensitive attributes are flipped. The model-agnostic approach eliminates the need for retraining or architectural modifications, making fairness interventions practical for deployed systems in high-stakes domains like credit scoring and criminal justice.
🏢 Meta
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce A-MBER, a benchmark dataset designed to evaluate AI assistants' ability to recognize emotions based on long-term interaction history rather than immediate context. The benchmark tests whether models can retrieve relevant past interactions, infer current emotional states, and provide grounded explanations—revealing that memory's value lies in selective, context-aware interpretation rather than simple historical volume.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers propose T-STAR, a novel reinforcement learning framework that structures multi-step agent trajectories as trees rather than independent chains, enabling better credit assignment for LLM agents. The method uses tree-based reward propagation and surgical policy optimization to improve reasoning performance across embodied, interactive, and planning tasks.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce a declarative runtime protocol that externalizes agent state to measure how much of an LLM-based agent's competence actually derives from the language model versus explicit structural components. Testing on Collaborative Battleship, they find that explicit world-model planning drives most performance gains, while sparse LLM-based revision at 4.3% of turns yields minimal and sometimes negative returns.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce AI-Sinkhole, an AI-agent augmented DNS-blocking framework that dynamically detects and temporarily blocks LLM chatbot services during proctored exams to prevent academic integrity violations. The system uses quantized LLMs for semantic classification and Pi-Hole for network-wide DNS blocking, achieving robust cross-lingual detection with F1-scores exceeding 0.83.
AIBearisharXiv – CS AI · Apr 106/10
🧠Researchers identified a critical robustness vulnerability in Qwen3-embedding models for conversational retrieval, where structured dialogue noise becomes disproportionately retrievable and contaminates search results. The problem remains invisible under standard benchmarks but is significantly more pronounced in Qwen3 than competing models, though lightweight query prompting effectively mitigates it.
AINeutralarXiv – CS AI · Apr 105/10
🧠Researchers have developed an interactive visualization system that displays the complete 181,440-state space of the 8-puzzle problem using GPU-based rendering, enabling students to explore search algorithm behavior in real-time. The system demonstrates that full state-space visualization is technically feasible and educationally valuable for AI education, bridging abstract algorithmic concepts with concrete puzzle manipulation.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers propose a composite architecture combining instruction-based refusal with a structural abstention gate to reduce hallucinations in large language models. The system uses a support deficit score derived from self-consistency, paraphrase stability, and citation coverage to block unreliable outputs, achieving better accuracy than either mechanism alone across multiple models.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers present CGD-PD, a test-time decoding method that improves large language models' performance on three-way logical question answering (True/False/Unknown) by enforcing negation consistency and resolving epistemic uncertainty through targeted entailment probes. The approach achieves up to 16% relative accuracy improvements on the FOLIO benchmark while reducing spurious Unknown predictions.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce Text2DistBench, a new benchmark for evaluating how well large language models understand distributional information—like trends and preferences across text collections—rather than just factual details. Built from YouTube comments about movies and music, the benchmark reveals that while LLMs outperform random baselines, their performance varies significantly across different distribution types, highlighting both capabilities and gaps in current AI systems.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers propose an ethical framework for sensor-fused health AI agents that combine biometric data with large language models. The paper identifies critical risks at the user-facing layer where sensor data is translated into health guidance, arguing that the perceived objectivity of biometrics can mask AI errors and turn them into harmful medical directives.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce SensorPersona, an LLM-based system that continuously extracts user personas from mobile sensor data rather than chat histories, achieving 31.4% higher recall in persona extraction and 85.7% win rate in personalized agent responses. The system processes multimodal sensor streams to infer physical patterns, psychosocial traits, and life experiences across longitudinal data collected from 20 participants over three months.
AINeutralarXiv – CS AI · Apr 106/10
🧠A research study analyzes six leading large language models to identify shared cultural patterns revealed in their training data, finding consensus around themes like narrative meaning-making, status competition, and moral rationalization. The findings suggest LLMs function as 'cultural condensates' that compress how humans describe and contest their social lives across massive text datasets.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers conducted a comparative analysis of demonstration selection strategies for using large language models to predict users' next point-of-interest (POI) based on historical location data. The study found that simple heuristic methods like geographical proximity and temporal ordering outperform complex embedding-based approaches in both computational efficiency and prediction accuracy, with LLMs using these heuristics sometimes matching fine-tuned model performance without additional training.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce DOVE, a distributional evaluation framework that measures how well large language models align with cultural values through open-ended text generation rather than multiple-choice tests. The framework uses rate-distortion optimization to create a value codebook and unbalanced optimal transport to assess alignment, demonstrating 31.56% correlation with downstream tasks across 12 LLMs while requiring only 500 samples per culture.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce chain-of-illocution (CoI) prompting to improve source faithfulness in retrieval-augmented language models, achieving up to 63% gains in source adherence for programming education tasks. The study reveals that standard RAG systems exhibit low fidelity to source materials, with non-RAG models performing worse, while a user study confirms improved faithfulness does not compromise user satisfaction.
AINeutralarXiv – CS AI · Apr 106/10
🧠A research paper proposes adaptive risk management frameworks for governing frontier AI in public sectors through 2030, arguing that static compliance models are insufficient given rapid capability advancement and incomplete knowledge of AI harms. The work emphasizes that effective governance requires organizational redesign, stronger policy capacity, and scenario-aware regulation rather than purely technical solutions.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers studying 469 Canadian youth aged 16-24 developed a negotiation-based framework to understand privacy decision-making with smart voice assistants, introducing two tension indices (RBTI and CATI) that measure competing risk-benefit and control-acceptance pressures. The study reveals that frequent SVA users exhibit benefit-dominant profiles and accept convenience trade-offs, suggesting the privacy paradox reflects negotiation rather than inconsistency.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce DISSECT, a 12,000-question diagnostic benchmark that reveals a critical "perception-integration gap" in Vision-Language Models—where VLMs successfully extract visual information but fail to reason about it during downstream tasks. Testing 18 VLMs across Chemistry and Biology shows open-source models systematically struggle with integrating visual input into reasoning, while closed-source models demonstrate superior integration capabilities.