🧠

AI

13,003 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

13003 articles

AIBullisharXiv – CS AI · Mar 36/108

🧠

DeepXiv-SDK: An Agentic Data Interface for Scientific Papers

DeepXiv-SDK introduces a new agentic data interface for scientific papers that enables AI research agents to access and process academic literature more efficiently. The SDK provides structured, budget-aware views of papers and supports progressive access patterns, currently deployed at arXiv scale with free API access.

AIBullisharXiv – CS AI · Mar 37/106

🧠

Joint Sensor Deployment and Physics-Informed Graph Transformer for Smart Grid Attack Detection

Researchers developed a physics-informed graph transformer network (PIGTN) for smart grid attack detection, using genetic algorithms to optimize sensor placement. The system achieved up to 37% accuracy improvement and 73% better detection rates while reducing false alarms to 0.3% across multiple power system benchmarks.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Alignment Is Not Enough: A Relational Framework for Moral Standing in Human-AI Interaction

Researchers propose a new framework called Relate for evaluating AI moral consideration based on relational capacity rather than consciousness verification. The framework addresses the governance gap as millions form emotional bonds with AI systems, but current regulations treat all AI interactions as simple tool use.

AIBearisharXiv – CS AI · Mar 37/108

🧠

The Global Landscape of Environmental AI Regulation: From the Cost of Reasoning to a Right to Green AI

A research paper reveals that generative AI systems deployed in 2025 have significantly higher environmental costs than previous AI generations, while current global regulations inadequately address these impacts. The authors propose mandatory model-level transparency, user opt-out rights, and international coordination to address environmental concerns in AI deployment.

AIBullisharXiv – CS AI · Mar 36/108

🧠

SurgFusion-Net: Diversified Adaptive Multimodal Fusion Network for Surgical Skill Assessment

Researchers developed SurgFusion-Net, a multimodal AI system for assessing surgical skills in robotic-assisted surgery. The system introduces new clinical datasets and fusion techniques that outperform existing baselines, addressing the domain gap between simulation and real clinical environments.

AINeutralarXiv – CS AI · Mar 36/107

🧠

The Value Sensitivity Gap: How Clinical Large Language Models Respond to Patient Preference Statements in Shared Decision-Making

A research study evaluated how four major large language models (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) respond to patient preferences in clinical decision-making scenarios. While all models acknowledged patient values, they showed modest actual recommendation shifting with value sensitivity indices ranging from 0.13 to 0.27, revealing gaps in how AI systems incorporate patient preferences into medical recommendations.

AIBullisharXiv – CS AI · Mar 36/107

🧠

Autorubric: A Unified Framework for Rubric-Based LLM Evaluation

Researchers introduce Autorubric, an open-source Python framework that standardizes rubric-based evaluation of large language models (LLMs) for text generation assessment. The framework addresses scattered evaluation techniques by providing a unified solution with configurable criteria, multi-judge ensembles, bias mitigation, and reliability metrics across three evaluation benchmarks.

AIBullisharXiv – CS AI · Mar 36/106

🧠

DINOv3 Meets YOLO26 for Weed Detection in Vegetable Crops

Researchers developed a foundational crop-weed detection model combining DINOv3 vision transformer with YOLO26 architecture, achieving significant improvements in precision agriculture applications. The model showed up to 14% better performance on cross-domain datasets while maintaining real-time processing at 28.5 fps despite increased computational requirements.

AINeutralarXiv – CS AI · Mar 37/109

🧠

Measuring What AI Systems Might Do: Towards A Measurement Science in AI

Researchers argue that current AI evaluation methods fail to properly measure true AI capabilities and propensities, which should be treated as dispositional properties. The paper proposes a more scientific framework for AI evaluation that requires mapping causal relationships between contextual conditions and behavioral outputs, moving beyond simple benchmark averages.

AIBullisharXiv – CS AI · Mar 37/108

🧠

VisRef: Visual Refocusing while Thinking Improves Test-Time Scaling in Multi-Modal Large Reasoning Models

Researchers developed VisRef, a new framework that improves visual reasoning in large AI models by re-injecting relevant visual tokens during the reasoning process. The method avoids expensive reinforcement learning fine-tuning while achieving up to 6.4% performance improvements on visual reasoning benchmarks.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Self-Service or Not? How to Guide Practitioners in Classifying AI Systems Under the EU AI Act

A new study evaluates how 78 industrial practitioners apply the EU AI Act's Risk Classification Scheme using a web-based tool, revealing challenges in interpreting legal definitions and regulatory scope. The research shows that targeted support with clear explanations can significantly improve the AI risk classification process for compliance.

AIBearisharXiv – CS AI · Mar 37/108

🧠

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Researchers have developed MIDAS, a new jailbreaking framework that successfully bypasses safety mechanisms in Multimodal Large Language Models by dispersing harmful content across multiple images. The technique achieved an 81.46% average attack success rate against four closed-source MLLMs by extending reasoning chains and reducing security attention.

$LINK

AIBearisharXiv – CS AI · Mar 36/107

🧠

Position: AI Agents Are Not (Yet) a Panacea for Social Simulation

Researchers argue that LLM-based AI agents are not yet effective for social simulation, despite growing optimism in the field. The paper identifies systematic mismatches between what current agent systems produce and what scientific simulation requires, calling for more rigorous validation frameworks.

$OP

AINeutralarXiv – CS AI · Mar 37/1010

🧠

Contesting Artificial Moral Agents

A research paper proposes a 5E framework (ethical, epistemological, explainable, empirical, evaluative) for contesting Artificial Moral Agents (AMAs) - AI systems with inherent moral reasoning capabilities. The framework includes spheres of ethical influence at individual, local, societal, and global levels, along with a timeline for developers to anticipate or self-contest their AMA technologies.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Breaking the Factorization Barrier in Diffusion Language Models

Researchers introduce Coupled Discrete Diffusion (CoDD), a breakthrough framework that solves the "factorization barrier" in diffusion language models by enabling parallel token generation without sacrificing coherence. The approach uses a lightweight probabilistic inference layer to model complex joint dependencies while maintaining computational efficiency.

AIBullisharXiv – CS AI · Mar 36/107

🧠

REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective

Researchers propose REMIND, a framework for medical multi-modal AI learning that addresses the challenge of missing data across multiple modalities. The solution uses a Mixture-of-Experts architecture to handle long-tail distributions of modality combinations and shows superior performance on real-world medical datasets.

AINeutralarXiv – CS AI · Mar 37/107

🧠

What Is the Geometry of the Alignment Tax?

Researchers present a formal geometric theory for quantifying the alignment tax - the tradeoff between AI safety and capability performance. They derive mathematical frameworks showing how safety-capability conflicts can be measured using angles between representation subspaces and provide scaling laws for how these tradeoffs evolve with model size.

AINeutralarXiv – CS AI · Mar 37/106

🧠

MOSAIC: Unveiling the Moral, Social and Individual Dimensions of Large Language Models

Researchers introduce MOSAIC, the first comprehensive benchmark to evaluate moral, social, and individual characteristics of Large Language Models beyond traditional Moral Foundation Theory. The benchmark includes over 600 curated questions and scenarios from nine validated questionnaires and four platform-based games, providing empirical evidence that current evaluation methods are insufficient for assessing AI ethics comprehensively.

AIBullisharXiv – CS AI · Mar 36/108

🧠

Reinforcement Learning for Control with Probabilistic Stability Guarantee: A Finite-Sample Approach

Researchers have developed L-REINFORCE, a novel reinforcement learning algorithm that provides probabilistic stability guarantees for control systems using finite data samples. The approach bridges reinforcement learning and control theory by extending classical REINFORCE algorithms with Lyapunov stability methods, demonstrating superior performance in Cartpole simulations.

AINeutralarXiv – CS AI · Mar 37/109

🧠

Property-Driven Evaluation of GNN Expressiveness at Scale: Datasets, Framework, and Study

Researchers developed a comprehensive evaluation framework for Graph Neural Networks (GNNs) using formal specification methods, creating 336 new datasets to test GNN expressiveness across 16 fundamental graph properties. The study reveals that no single pooling approach consistently performs well across all properties, with attention-based pooling excelling in generalization while second-order pooling provides better sensitivity.

AINeutralarXiv – CS AI · Mar 37/106

🧠

StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser

Researchers introduce StaTS, a new diffusion model for time series forecasting that learns adaptive noise schedules and uses frequency-guided denoising. The model addresses limitations of fixed noise schedules in existing diffusion models by incorporating spectral regularization and data-adaptive scheduling for improved structural preservation.

$NEAR

AIBullisharXiv – CS AI · Mar 37/107

🧠

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

Researchers introduce Attn-QAT, the first systematic approach to 4-bit quantization-aware training for attention mechanisms in AI models. The method enables stable FP4 computation on emerging GPUs and delivers up to 1.5x speedup on RTX 5090 while maintaining model quality across diffusion and language models.

AIBullisharXiv – CS AI · Mar 37/107

🧠

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Researchers introduce CARE, a new framework for improving LLM evaluation by addressing correlated errors in AI judge ensembles. The method separates true quality signals from confounding factors like verbosity and style preferences, achieving up to 26.8% error reduction across 12 benchmarks.

AIBullisharXiv – CS AI · Mar 37/108

🧠

LitBench: A Graph-Centric Large Language Model Benchmarking Tool For Literature Tasks

Researchers have introduced LitBench, a new benchmarking tool designed to develop and evaluate domain-specific large language models for literature-related tasks. The tool uses graph-centric data curation to generate domain-specific literature sub-graphs and creates training datasets, with results showing small domain-specific LLMs achieving competitive performance against state-of-the-art models like GPT-4o.

AIBullisharXiv – CS AI · Mar 36/107

🧠

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.

← PrevPage 238 of 521Next →