🧠

AI

12,713 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12713 articles

AIBullisharXiv – CS AI · Apr 146/10

🧠

Efficient Process Reward Modeling via Contrastive Mutual Information

Researchers propose CPMI, an automated method for training process reward models that reduces annotation costs by 84% and computational overhead by 98% compared to traditional Monte Carlo approaches. The technique uses contrastive mutual information to assign reward scores to reasoning steps in AI chain-of-thought trajectories without expensive human annotation or repeated LLM rollouts.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

Researchers introduce Skill-SD, a novel training framework for multi-turn LLM agents that improves sample efficiency by converting successful agent trajectories into dynamic natural language skills that condition a teacher model. The approach combines reinforcement learning with self-distillation and achieves significant performance improvements over baseline methods on benchmark tasks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

Researchers introduce Critical-CoT, a defense framework that protects large language models against reasoning-level backdoor attacks by fine-tuning models to develop critical thinking behaviors. Unlike token-level backdoors, these attacks inject malicious reasoning steps into chain-of-thought processes, making them harder to detect; the proposed defense demonstrates strong robustness across multiple LLMs and datasets.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game

Researchers propose CanaryRAG, a runtime defense mechanism that protects Retrieval-Augmented Generation systems from adversarial attacks that extract proprietary data from knowledge bases. The solution uses embedded canary tokens to detect leakage in real-time while maintaining normal system performance, offering a practical safeguard for organizations deploying RAG-based AI systems.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Tail-Aware Information-Theoretic Generalization for RLHF and SGLD

Researchers develop a new information-theoretic framework that handles heavy-tailed data distributions, addressing limitations in classical generalization bounds used in machine learning. The work applies specifically to reinforcement learning from human feedback (RLHF) and stochastic gradient optimization, where traditional KL-divergence tools fail due to non-existent moment generating functions.

AIBearisharXiv – CS AI · Apr 146/10

🧠

Perceived Importance of Cognitive Skills Among Computing Students in the Era of AI

A quantitative study of undergraduate computing students reveals concerning perceptions about cognitive skill development in an AI-integrated educational landscape. Students expect all 11 measured cognitive skills to diminish in importance as AI adoption increases, prompting calls for educational interventions to preserve critical thinking and analytical capabilities.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Prosociality by Coupling, Not Mere Observation: Homeostatic Sharing in an Inspectable Recurrent Artificial Life Agent

Researchers demonstrate that artificial agents exhibit prosocial helping behavior when another agent's needs are integrated into their own self-regulatory mechanisms, rather than through explicit social rewards or observation alone. The study uses inspectable recurrent controllers with affect-coupled regulation across two experimental environments, showing that coupling creates a sharp behavioral switch from selfish to helping actions regardless of task complexity.

AIBullisharXiv – CS AI · Apr 146/10

🧠

TInR: Exploring Tool-Internalized Reasoning in Large Language Models

Researchers propose Tool-Internalized Reasoning (TInR), a framework that embeds tool knowledge directly into Large Language Models rather than relying on external tool documentation during reasoning. The TInR-U model uses a three-phase training pipeline combining knowledge alignment, supervised fine-tuning, and reinforcement learning to improve reasoning efficiency and performance across various tasks.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series

Researchers have optimized the Bielik v3 language models (7B and 11B parameters) by replacing universal tokenizers with Polish-specific vocabulary, addressing inefficiencies in morphological representation. This optimization reduces token fertility, lowers inference costs, and expands effective context windows while maintaining multilingual capabilities through advanced training techniques including supervised fine-tuning and reinforcement learning.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Task2vec Readiness: Diagnostics for Federated Learning from Pre-Training Embeddings

Researchers propose Task2Vec-based readiness indices to predict federated learning performance before training begins. By computing unsupervised metrics from pre-training embeddings, the method achieves correlation coefficients exceeding 0.9 with final outcomes, offering practitioners a diagnostic tool to assess federation alignment and heterogeneity impact.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Ambiguity Detection and Elimination in Automated Executable Process Modeling

Researchers have developed a framework to detect and eliminate ambiguities in natural-language specifications converted to executable BPMN process models by large language models. The method identifies behavioral inconsistencies through KPI analysis, diagnoses gateway logic problems, and repairs source text through evidence-based refinement, reducing variability in regenerated model behavior.

AIBullisharXiv – CS AI · Apr 146/10

🧠

QShield: Securing Neural Networks Against Adversarial Attacks using Quantum Circuits

Researchers introduce QShield, a hybrid quantum-classical neural network architecture that combines traditional CNNs with quantum processing modules to defend deep learning models against adversarial attacks. Testing on MNIST, OrganAMNIST, and CIFAR-10 datasets shows the hybrid approach maintains accuracy while substantially reducing attack success rates and increasing computational costs for adversaries.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models

Researchers reveal that unified multimodal models (UMMs) combining language and vision capabilities fail to achieve genuine synergy, exhibiting divergent information patterns that undermine reasoning transfer to image synthesis. An information-theoretic framework analyzing ten models shows pseudo-unification stems from asymmetric encoding and conflicting response patterns, with only models implementing contextual prediction achieving stronger text-to-image reasoning.

AIBullisharXiv – CS AI · Apr 146/10

🧠

MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

Researchers introduced MMR-AD, a large-scale multimodal dataset designed to benchmark general anomaly detection using Multimodal Large Language Models (MLLMs). The study reveals that current state-of-the-art MLLMs fall short of industrial requirements for anomaly detection, though a proposed baseline model called Anomaly-R1 demonstrates significant improvements through reasoning-based approaches enhanced by reinforcement learning.

AINeutralarXiv – CS AI · Apr 146/10

🧠

When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

Researchers demonstrate that large language models can extract predictive features from financial news with valid intermediate signals (Information Coefficient >0.15), yet these features fail to improve reinforcement learning trading agents during macroeconomic shocks. The findings reveal a critical gap between feature-level validity and downstream policy robustness, suggesting that valid signals alone cannot guarantee trading performance under distribution shifts.

AINeutralarXiv – CS AI · Apr 146/10

🧠

A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities

Researchers demonstrate that inducing specific personas in Large Language Models produces measurable shifts in cognitive task performance, with effects showing 73.68% alignment to human personality-cognition relationships. The study introduces Dynamic Persona Routing, a lightweight strategy that optimizes LLM performance by dynamically selecting personas based on query type, outperforming static persona approaches without additional training.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds

Researchers demonstrate that five mature small language model architectures (1.5B-8B parameters) share nearly identical emotion vector representations despite exhibiting opposite behavioral profiles, suggesting emotion geometry is a universal feature organized early in model development. The study also deconstructs prior emotion-vector research methodology into four distinct layers of confounding factors, revealing that single correlations between studies cannot safely establish comparability.

🧠 Llama

AINeutralarXiv – CS AI · Apr 146/10

🧠

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

ReSpinQuant introduces an efficient quantization framework for large language models that combines the expressivity of layer-wise adaptation with the computational efficiency of global rotation methods. By leveraging offline activation rotation fusion and residual subspace rotation matching, the approach achieves state-of-the-art performance on aggressive quantization schemes (W4A4, W3A3) without significant inference overhead.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Efficient Training for Cross-lingual Speech Language Models

Researchers introduce Cross-lingual Speech Language Models (CSLM), an efficient training method for building multilingual speech AI systems using discrete speech tokens. The approach achieves cross-modal and cross-lingual alignment through continual pre-training and instruction fine-tuning, enabling effective speech LLMs without requiring massive datasets.

AIBullisharXiv – CS AI · Apr 146/10

🧠

BoxTuning: Directly Injecting the Object Box for Multimodal Model Fine-Tuning

Researchers introduce BoxTuning, a novel approach for improving video understanding in multimodal AI models by rendering object bounding boxes directly onto video frames as visual prompts rather than encoding them as text tokens. The method achieves 87-93% reduction in text token usage while maintaining full temporal resolution, demonstrating superior performance on video question-answering tasks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems

Researchers introduce EmbodiedGovBench, a new evaluation framework for embodied AI systems that measures governance capabilities like controllability, policy compliance, and auditability rather than just task completion. The benchmark addresses a critical gap in AI safety by establishing standards for whether robot systems remain safe, recoverable, and responsive to human oversight under realistic failures.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Taking a Pulse on How Generative AI is Reshaping the Software Engineering Research Landscape

A large-scale survey of 457 software engineering researchers reveals that generative AI adoption is widespread in academic research, concentrated primarily in writing and early-stage tasks. While researchers perceive significant productivity gains, persistent concerns about accuracy, bias, and lack of governance frameworks highlight the need for clearer guidelines on responsible AI integration in academic practice.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method

Researchers introduce ConflictQA, a benchmark revealing that large language models struggle with conflicting information across different knowledge sources (text vs. knowledge graphs) in retrieval-augmented generation systems. The study proposes XoT, an explanation-based framework to improve faithful reasoning when LLMs encounter contradictory evidence.

AIBullisharXiv – CS AI · Apr 146/10

🧠

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Researchers introduce MEDS, a memory-enhanced reward shaping framework that addresses a critical reinforcement learning failure mode where language models repeatedly generate similar errors. By tracking historical behavioral patterns and penalizing recurring mistake clusters, the method achieves consistent performance improvements across multiple datasets and models while increasing sampling diversity.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Network Effects and Agreement Drift in LLM Debates

Researchers examining LLM agent behavior in simulated debates discovered a phenomenon called 'agreement drift,' where AI agents systematically shift toward specific positions on opinion scales in ways that don't mirror human behavior. The study reveals critical biases in using LLMs as proxies for human social systems, particularly when modeling minority groups or unbalanced social contexts.

← PrevPage 147 of 509Next →