🧠

AI

21,013 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21013 articles

AINeutralarXiv – CS AI · Apr 106/10

🧠

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

Researchers introduce Sol-RL, a two-stage reinforcement learning framework that combines FP4 quantization for efficient rollout generation with BF16 precision for policy optimization in diffusion models. The approach achieves up to 4.64x training acceleration while maintaining alignment quality, addressing the computational bottleneck of scaling RL-based post-training on large foundational models like FLUX.1.

AIBearisharXiv – CS AI · Apr 106/10

🧠

The Impact of Steering Large Language Models with Persona Vectors in Educational Applications

Researchers studied how persona vectors—AI steering techniques that inject personality traits into large language models—affect educational applications like essay generation and automated grading. The study found that persona steering significantly degrades answer quality, with substantially larger negative impacts on open-ended humanities tasks compared to factual science questions, and reveals that AI scorers exhibit predictable bias patterns based on assigned personality traits.

AINeutralarXiv – CS AI · Apr 106/10

🧠

SentinelSphere: Integrating AI-Powered Real-Time Threat Detection with Cybersecurity Awareness Training

SentinelSphere is an AI-powered cybersecurity platform combining machine learning-based threat detection with LLM-driven security training to address both technical vulnerabilities and human-factor weaknesses in enterprise security. The system uses an Enhanced DNN model trained on benchmark datasets for real-time threat identification and deploys a quantized Phi-4 model for accessible security education, validated by industry professionals as intuitive and effective.

AIBearisharXiv – CS AI · Apr 106/10

🧠

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

Researchers introduce MedDialBench, a comprehensive benchmark testing how large language models maintain diagnostic accuracy when patients exhibit adversarial behaviors across five dimensions. The study reveals that fabricating symptoms causes 1.7-3.4x larger accuracy drops than withholding information, with worst-case performance degradation ranging from 38.8 to 54.1 percentage points across tested models.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Towards Privacy-Preserving Large Language Model: Text-free Inference Through Alignment and Adaptation

Researchers introduce Privacy-Preserving Fine-Tuning (PPFT), a novel training approach that enables LLM services to process user queries without receiving raw text, addressing privacy vulnerabilities in current deployments. The method uses client-side encoders and noise-injected embeddings to maintain competitive model performance while eliminating exposure of sensitive personal, medical, or legal information.

AIBullisharXiv – CS AI · Apr 106/10

🧠

Instance-Adaptive Parametrization for Amortized Variational Inference

Researchers introduce Instance-Adaptive VAE (IA-VAE), a new framework that uses hypernetworks to generate input-specific parameter modulations for variational autoencoders, reducing the amortization gap while maintaining computational efficiency. The approach demonstrates improved posterior approximation accuracy on synthetic data and consistently better ELBO performance on image benchmarks compared to standard VAEs.

AINeutralarXiv – CS AI · Apr 106/10

🧠

FedDAP: Domain-Aware Prototype Learning for Federated Learning under Domain Shift

Researchers introduce FedDAP, a federated learning framework that addresses domain shift challenges by constructing domain-specific global prototypes rather than single aggregated prototypes. The method aligns local features with prototypes from the same domain while encouraging separation from different domains, improving model generalization across heterogeneous client data.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Sparse-Aware Neural Networks for Nonlinear Functionals: Mitigating the Exponential Dependence on Dimension

Researchers propose a sparse-aware neural network framework that combines convolutional architectures with fully connected networks to improve operator learning over infinite-dimensional function spaces. The approach significantly reduces the curse of dimensionality and sample complexity requirements for approximating nonlinear functionals, with improved theoretical guarantees for both deterministic and random sampling schemes.

AINeutralarXiv – CS AI · Apr 106/10

🧠

REVEAL: Reasoning-Enhanced Forensic Evidence Analysis for Explainable AI-Generated Image Detection

Researchers introduce REVEAL, an explainable AI framework for detecting AI-generated images through forensic evidence chains and expert-grounded reinforcement learning. The approach addresses the growing challenge of distinguishing synthetic images from authentic ones while providing transparent, verifiable reasoning for detection decisions.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Luwen Technical Report

Researchers have developed Luwen, an open-source Chinese legal language model built on Baichuan that uses continual pre-training, supervised fine-tuning, and retrieval-augmented generation to excel at legal tasks. The model outperforms baselines on five legal benchmarks including judgment prediction, judicial examination, and legal reasoning, demonstrating effective domain adaptation for specialized legal applications.

AIBullisharXiv – CS AI · Apr 106/10

🧠

Fine-grained Approaches for Confidence Calibration of LLMs in Automated Code Revision

Researchers propose fine-grained confidence calibration methods for large language models in automated code revision tasks, addressing the limitation of traditional global calibration approaches. By applying local Platt-scaling to task-specific confidence scores, the study demonstrates improved calibration accuracy across multiple code repair and refinement tasks, enabling developers to better trust LLM outputs.

AINeutralarXiv – CS AI · Apr 106/10

🧠

On the Step Length Confounding in LLM Reasoning Data Selection

Researchers identify a critical flaw in naturalness-based data selection methods for large language model reasoning datasets, where algorithms systematically favor longer reasoning steps rather than higher-quality reasoning. The study proposes two corrective methods (ASLEC-DROP and ASLEC-CASL) that successfully mitigate this 'step length confounding' bias across multiple LLM benchmarks.

AINeutralarXiv – CS AI · Apr 106/10

🧠

TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks

Researchers introduce TeamLLM, a multi-LLM collaboration framework that emulates human team structures with distinct roles to improve performance on complex, multi-step tasks. The team proposes a new CGPST benchmark for evaluating LLM performance on contextualized procedural tasks, demonstrating substantial improvements over single-perspective approaches.

AIBearisharXiv – CS AI · Apr 106/10

🧠

Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios

Researchers introduce CLI-Tool-Bench, a new benchmark for evaluating large language models' ability to generate complete software from scratch. Testing seven state-of-the-art LLMs reveals that top models achieve under 43% success rates, exposing significant limitations in current AI-driven 0-to-1 software generation despite increased computational investment.

AINeutralarXiv – CS AI · Apr 106/10

🧠

A Graph-Enhanced Defense Framework for Explainable Fake News Detection with LLM

Researchers propose G-Defense, a graph-enhanced framework that uses large language models and retrieval-augmented generation to detect fake news while providing explainable, fine-grained reasoning. The system decomposes news claims into sub-claims, retrieves competing evidence, and generates transparent explanations without requiring verified fact-checking databases.

AINeutralarXiv – CS AI · Apr 106/10

🧠

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

Researchers introduced SkillSieve, a three-layer detection framework that identifies malicious AI agent skills in OpenClaw's ClawHub marketplace, where 13-26% of over 13,000 skills contain security vulnerabilities. The system combines regex/AST scanning, LLM-based analysis with parallel sub-tasks, and multi-LLM voting to achieve 0.800 F1 score at $0.006 per skill, significantly outperforming existing detection methods.

AIBullisharXiv – CS AI · Apr 106/10

🧠

Improving Robustness In Sparse Autoencoders via Masked Regularization

Researchers propose a masked regularization technique to improve the robustness and interpretability of Sparse Autoencoders (SAEs) used in large language model analysis. The method addresses feature absorption and out-of-distribution performance failures by randomly replacing tokens during training to disrupt co-occurrence patterns, offering a practical path toward more reliable mechanistic interpretability tools.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries

Researchers evaluated whether large language models understand long-form narratives similarly to humans by comparing summaries of 150 novels written by humans and nine state-of-the-art LLMs. The study found that LLMs focus disproportionately on story endings rather than distributing attention like human readers, revealing gaps in narrative comprehension despite expanded context windows.

AINeutralarXiv – CS AI · Apr 106/10

🧠

LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources

Researchers introduce Guardian Parser Pack, an AI-driven system that extracts and normalizes missing-person intelligence from heterogeneous documents using LLM-assisted parsing combined with schema validation. The system achieved 86.64% F1 score on manual evaluation while improving data completeness to 96.97%, demonstrating practical viability of probabilistic AI in high-stakes investigative workflows.

AINeutralarXiv – CS AI · Apr 106/10

🧠

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

Researchers discovered that large language models have a fundamental limitation in latent reasoning: they can discover multi-step planning strategies without explicit supervision, but only up to depths of 3-7 steps depending on model size and training method. This finding suggests that complex reasoning tasks may require explicit chain-of-thought monitoring rather than relying on hidden internal computations.

🧠 GPT-4🧠 GPT-5

AINeutralarXiv – CS AI · Apr 106/10

🧠

Neural Computers

Researchers propose Neural Computers (NCs), a new computing paradigm where AI models function as executable runtime environments rather than static predictors. The work demonstrates early NC prototypes using video models that process instructions and user actions to generate screen frames, establishing foundational I/O primitives while identifying significant challenges toward achieving general-purpose Completely Neural Computers (CNCs).

AINeutralarXiv – CS AI · Apr 106/10

🧠

Say Something Else: Rethinking Contextual Privacy as Information Sufficiency

Researchers formalize privacy-preserving communication for LLM agents by introducing Information Sufficiency (IS) as a framework and proposing free-text pseudonymization as a third privacy strategy alongside suppression and generalization. Evaluation across 792 scenarios reveals that pseudonymization offers superior privacy-utility tradeoffs, and that multi-turn conversational testing exposes significant privacy leakage missed by single-message assessments.

AINeutralarXiv – CS AI · Apr 106/10

🧠

"Don't Be Afraid, Just Learn": Insights from Industry Practitioners to Prepare Software Engineers in the Age of Generative AI

A study of 51 industry practitioners reveals that generative AI integration into software development has created a significant gap between university curricula and industry hiring expectations. The research identifies new required skills like prompting and output evaluation, while emphasizing that soft skills and traditional competencies remain critical for modern software engineers.

AIBullisharXiv – CS AI · Apr 106/10

🧠

MAT-Cell: A Multi-Agent Tree-Structured Reasoning Framework for Batch-Level Single-Cell Annotation

Researchers introduce MAT-Cell, a neuro-symbolic AI framework that combines large language models with biological constraints to improve single-cell annotation accuracy. The system uses multi-agent reasoning and verification processes to overcome limitations in both supervised learning and LLM-based approaches, demonstrating superior performance on cross-species benchmarks.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Attribution-Driven Explainable Intrusion Detection with Encoder-Based Large Language Models

Researchers propose an attribution-driven approach to make encoder-based Large Language Models more transparent and trustworthy for network intrusion detection in Software-Defined Networks. By analyzing which traffic features drive model decisions, the study demonstrates that LLMs learn legitimate attack behavior patterns, addressing a critical barrier to deploying AI security tools in sensitive environments.

← PrevPage 473 of 841Next →