🧠

AI

12,520 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12520 articles

AINeutralarXiv – CS AI · Apr 206/10

🧠

The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE

A research paper proposes that AI-driven software engineering doesn't threaten the field but rather expands its scope to include 'semi-executable' artifacts—combinations of natural language, tools, and workflows requiring human or probabilistic interpretation. The Semi-Executable Stack model provides a diagnostic framework across six layers to understand how software engineering practices evolve as AI agents handle routine tasks.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation

Researchers propose a multi-objective unlearning framework for Large Language Models that simultaneously removes hazardous information, preserves general utility, avoids over-refusal, and resists adversarial attacks. The method uses unified domain representation and bidirectional logit distillation to harmonize competing optimization goals, achieving state-of-the-art performance across diverse unlearning requirements.

AINeutralarXiv – CS AI · Apr 206/10

🧠

LLMbench: A Comparative Close Reading Workbench for Large Language Models

LLMbench is a new browser-based tool that enables detailed comparative analysis of large language model outputs through side-by-side visualization and token-level probability inspection. Unlike existing quantitative comparison tools, it applies digital humanities methodology to make the probabilistic structure of LLM-generated text legible through multiple analytical overlays and visualization modes.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)

Researchers introduce SSAS, a framework that improves LLM consistency for sentiment analysis by applying hierarchical classification and iterative summarization to enforce bounded attention on raw text. Testing on three standard datasets shows the method reduces analytical variance by up to 30%, addressing the fundamental challenge of using non-deterministic LLMs for enterprise-grade analytics.

🧠 Gemini

AINeutralarXiv – CS AI · Apr 206/10

🧠

Reward Weighted Classifier-Free Guidance as Policy Improvement in Autoregressive Models

Researchers demonstrate that reward-weighted classifier-free guidance (RCFG) can dynamically adjust autoregressive model outputs to optimize arbitrary reward functions at test time without retraining. Applied to molecular generation, this approach enables real-time optimization of competing objectives and accelerates reinforcement learning convergence when used as a teacher for policy distillation.

AIBullisharXiv – CS AI · Apr 206/10

🧠

"Excuse me, may I say something..." CoLabScience, A Proactive AI Assistant for Biomedical Discovery and LLM-Expert Collaborations

Researchers introduce CoLabScience, a proactive AI assistant designed to enhance biomedical research collaboration by intervening in scientific discussions at optimal moments. The system uses PULI, a reinforcement learning framework that learns when and how to contribute based on project context and conversation history, supported by a new benchmark dataset (BSDD) of simulated research dialogues.

AINeutralarXiv – CS AI · Apr 206/10

🧠

LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

Researchers conducted a comparative study of how large language models trained with different fine-tuning methods (full fine-tuning, LoRA, and quantized LoRA) interpret code compliance tasks. The study reveals that full fine-tuning produces more focused attribution patterns than parameter-efficient methods, and larger models develop distinct interpretive strategies despite performance gains plateauing above 7B parameters.

AINeutralarXiv – CS AI · Apr 206/10

🧠

DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation

Researchers propose DALM, a Domain-Algebraic Language Model that constrains token generation through structured denoising across domain lattices rather than unconstrained decoding. The framework uses algebraic constraints across three phases—domain, relation, and concept resolution—to prevent cross-domain knowledge interference and improve factual accuracy in specialized domains.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies

A research study comparing simulated AI interactions with real human subjects reveals that AI transparency significantly outweighs personality factors in determining interaction quality, with findings diverging notably between pure simulation and actual human experiments across hiring and transactional scenarios.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

Researchers propose AdaRankLLM, an adaptive retrieval-augmented generation framework that dynamically filters irrelevant passages to reduce computational overhead while maintaining output quality. The study challenges whether adaptive retrieval remains necessary as language models grow more robust, finding that its value differs significantly between weaker and stronger models.

AINeutralarXiv – CS AI · Apr 206/10

🧠

HYPERHEURIST: A Simulated Annealing-Based Control Framework for LLM-Driven Code Generation in Optimized Hardware Design

HYPERHEURIST introduces a simulated annealing control framework that enhances LLM-generated hardware design by treating outputs as optimization candidates rather than final products. The system combines functional validation through compilation and simulation with Power-Performance-Area optimization, demonstrating more stable results than single-pass LLM generation across eight benchmarks.

AIBullisharXiv – CS AI · Apr 206/10

🧠

SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification

SSMamba introduces a self-supervised hybrid state space model designed to improve pathological image classification by addressing domain shift, local-global relationship modeling, and fine-grained feature detection. The framework outperforms 11 state-of-the-art pathological foundation models on multiple public datasets without requiring large external training datasets.

AINeutralarXiv – CS AI · Apr 206/10

🧠

GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

Researchers introduce GTA-2, a hierarchical benchmark that evaluates AI agents on both atomic tool-use tasks and complex, open-ended workflows using real user queries and deployed tools. The study reveals a significant capability cliff where frontier AI models achieve below 50% success on atomic tasks and only 14.39% on realistic workflows, highlighting that execution framework design matters as much as underlying model capacity.

AIBullisharXiv – CS AI · Apr 206/10

🧠

Privacy-Preserving LLMs Routing

Researchers propose PPRoute, a privacy-preserving framework for LLM routing that uses Secure Multi-Party Computation (MPC) to protect user data while dynamically selecting between model providers. The system achieves 20x speedup over naive MPC implementations through optimized encoder inference, multi-step model training, and an efficient Top-k algorithm, maintaining routing quality without sacrificing privacy.

AINeutralarXiv – CS AI · Apr 206/10

🧠

DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference

Researchers introduce DepCap, a training-free framework that optimizes diffusion language model (DLM) inference through adaptive block-wise parallel decoding. The method achieves up to 5.63× speedup by using cross-step signals to determine block boundaries and identifying conflict-free token subsets for safe parallel execution, maintaining quality while significantly accelerating inference.

AIBullisharXiv – CS AI · Apr 206/10

🧠

cuNNQS-SCI: A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection withNeural Network QQantum States

Researchers introduced cuNNQS-SCI, a fully GPU-accelerated framework that solves a critical scalability bottleneck in neural network quantum state methods for solving complex quantum systems. The system achieves 2.32X speedup over previous CPU-GPU hybrid approaches while maintaining chemical accuracy, demonstrating 90%+ parallel efficiency across 64 GPUs.

🏢 Nvidia

AINeutralarXiv – CS AI · Apr 206/10

🧠

Self-Distillation as a Performance Recovery Mechanism for LLMs: Counteracting Compression and Catastrophic Forgetting

Researchers introduce Self-Distillation Fine-Tuning (SDFT), a framework that recovers performance degradation in Large Language Models caused by compression, quantization, and catastrophic forgetting. Using Centered Kernel Alignment analysis, the study demonstrates that self-distillation works by aligning the student model's high-dimensional manifold with the teacher model's optimal representation structure.

AINeutralarXiv – CS AI · Apr 206/10

🧠

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Researchers introduce DPrivBench, a benchmark for evaluating how well large language models can reason about differential privacy algorithms and verify their correctness. Testing shows current LLMs handle basic DP mechanisms competently but fail significantly on advanced algorithms, exposing critical gaps in automated privacy reasoning capabilities.

AIBullisharXiv – CS AI · Apr 206/10

🧠

DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition

Researchers introduce DiZiNER, a framework that improves zero-shot named entity recognition by simulating human annotation disagreement processes using multiple LLMs. The approach achieves state-of-the-art results on 14 of 18 benchmarks, closing the performance gap between zero-shot and supervised systems by over 11 percentage points.

🧠 GPT-5

AINeutralarXiv – CS AI · Apr 206/10

🧠

From Vulnerable Data Subjects to Vulnerabilizing Data Practices: Navigating the Protection Paradox in AI-Based Analyses of Platformized Lives

This academic paper examines how AI and data science practices can paradoxically increase vulnerability of subjects they aim to protect, using a case study of computer vision analysis of children in monetized YouTube content. The authors develop an ethics protocol identifying four critical decision points—dataset design, operationalization, inference, and dissemination—where technical choices create vulnerabilizing factors including exposure, monetization, narrative fixing, and algorithmic optimization.

AIBearisharXiv – CS AI · Apr 206/10

🧠

Where does output diversity collapse in post-training?

Researchers discover that post-trained language models experience systematic output diversity collapse, where fine-tuning methods reduce the variety of generated responses compared to base models. This collapse is determined during training by data composition choices and cannot be fixed through inference-time adjustments, with implications for scaling methods and creative AI applications.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs

Researchers introduced 'Mind's Eye,' a benchmark that tests multimodal large language models (MLLMs) on visual reasoning tasks inspired by human intelligence tests. The evaluation reveals a significant gap between human performance (80% accuracy) and leading MLLMs (below 50%), exposing limitations in visuospatial reasoning, visual attention, and conceptual abstraction.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure

Researchers introduce Availability-Weighted Probabilistic Synchronous Parallel (AW-PSP), an improved federated learning algorithm that addresses bias in node sampling when device availability and data distribution are correlated. The technique uses dynamic probability adjustments, Markov-based failure prediction, and distributed metadata management to improve fairness and robustness in edge computing environments where devices frequently fail or become unavailable.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Training Time Prediction for Mixed Precision-based Distributed Training

Researchers have developed a precision-aware training time predictor for distributed deep learning that accounts for floating-point precision settings, achieving 9.8% prediction accuracy compared to 147.85% error in existing models that ignore precision variations. The work addresses a critical gap in resource allocation and cost estimation for AI training workloads, where precision choices can create 2.4x variations in training time.

AINeutralarXiv – CS AI · Apr 206/10

🧠

AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency

Researchers introduce AtManRL, a method that combines differentiable attention manipulation with reinforcement learning to improve the faithfulness of chain-of-thought reasoning in large language models. By training attention masks to identify which tokens genuinely influence model predictions, the approach demonstrates that LLM reasoning traces can be made more interpretable and transparent.

🧠 Llama

← PrevPage 124 of 501Next →