🧠

AI

21,463 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21463 articles

AIBullisharXiv – CS AI · Mar 36/104

🧠

Pulse-Driven Neural Architecture: Learnable Oscillatory Dynamics for Robust Continuous-Time Sequence Processing

Researchers introduce PDNA (Pulse-Driven Neural Architecture), a new continuous-time neural network that incorporates learnable oscillatory dynamics to improve robustness when input sequences are interrupted. The method shows significant performance improvements on sequential MNIST tasks, with the pulse variant achieving a 4.62 percentage point advantage over baseline models.

AIBullisharXiv – CS AI · Mar 37/107

🧠

PEPA: a Persistently Autonomous Embodied Agent with Personalities

Researchers developed PEPA, a three-layer cognitive architecture that enables robots to operate autonomously using personality traits to generate goals without external supervision. The system was successfully tested on a quadruped robot in a real-world office environment, demonstrating sustained autonomous behavior across five personality prototypes.

AIBullisharXiv – CS AI · Mar 36/108

🧠

DeepXiv-SDK: An Agentic Data Interface for Scientific Papers

DeepXiv-SDK introduces a new agentic data interface for scientific papers that enables AI research agents to access and process academic literature more efficiently. The SDK provides structured, budget-aware views of papers and supports progressive access patterns, currently deployed at arXiv scale with free API access.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Alignment Is Not Enough: A Relational Framework for Moral Standing in Human-AI Interaction

Researchers propose a new framework called Relate for evaluating AI moral consideration based on relational capacity rather than consciousness verification. The framework addresses the governance gap as millions form emotional bonds with AI systems, but current regulations treat all AI interactions as simple tool use.

AIBullisharXiv – CS AI · Mar 36/107

🧠

Autorubric: A Unified Framework for Rubric-Based LLM Evaluation

Researchers introduce Autorubric, an open-source Python framework that standardizes rubric-based evaluation of large language models (LLMs) for text generation assessment. The framework addresses scattered evaluation techniques by providing a unified solution with configurable criteria, multi-judge ensembles, bias mitigation, and reliability metrics across three evaluation benchmarks.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Exact and Asymptotically Complete Robust Verifications of Neural Networks via Quantum Optimization

Researchers have developed quantum optimization models for robust verification of deep neural networks against adversarial attacks. The approach provides exact verification for ReLU networks and asymptotically complete verification for networks with general activation functions like sigmoid and tanh.

AIBullisharXiv – CS AI · Mar 37/106

🧠

Joint Sensor Deployment and Physics-Informed Graph Transformer for Smart Grid Attack Detection

Researchers developed a physics-informed graph transformer network (PIGTN) for smart grid attack detection, using genetic algorithms to optimize sensor placement. The system achieved up to 37% accuracy improvement and 73% better detection rates while reducing false alarms to 0.3% across multiple power system benchmarks.

AINeutralarXiv – CS AI · Mar 36/107

🧠

The Value Sensitivity Gap: How Clinical Large Language Models Respond to Patient Preference Statements in Shared Decision-Making

A research study evaluated how four major large language models (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) respond to patient preferences in clinical decision-making scenarios. While all models acknowledged patient values, they showed modest actual recommendation shifting with value sensitivity indices ranging from 0.13 to 0.27, revealing gaps in how AI systems incorporate patient preferences into medical recommendations.

AIBearisharXiv – CS AI · Mar 36/106

🧠

Stochastic Parrots or Singing in Harmony? Testing Five Leading LLMs for their Ability to Replicate a Human Survey with Synthetic Data

Researchers compared human survey responses from 420 Silicon Valley developers with synthetic data from five leading LLMs including ChatGPT, Claude, and Gemini. While AI models produced technically plausible results, they failed to capture counterintuitive insights and only replicated conventional wisdom rather than revealing novel findings.

AINeutralarXiv – CS AI · Mar 37/1010

🧠

Contesting Artificial Moral Agents

A research paper proposes a 5E framework (ethical, epistemological, explainable, empirical, evaluative) for contesting Artificial Moral Agents (AMAs) - AI systems with inherent moral reasoning capabilities. The framework includes spheres of ethical influence at individual, local, societal, and global levels, along with a timeline for developers to anticipate or self-contest their AMA technologies.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Self-Service or Not? How to Guide Practitioners in Classifying AI Systems Under the EU AI Act

A new study evaluates how 78 industrial practitioners apply the EU AI Act's Risk Classification Scheme using a web-based tool, revealing challenges in interpreting legal definitions and regulatory scope. The research shows that targeted support with clear explanations can significantly improve the AI risk classification process for compliance.

AINeutralarXiv – CS AI · Mar 37/109

🧠

Measuring What AI Systems Might Do: Towards A Measurement Science in AI

Researchers argue that current AI evaluation methods fail to properly measure true AI capabilities and propensities, which should be treated as dispositional properties. The paper proposes a more scientific framework for AI evaluation that requires mapping causal relationships between contextual conditions and behavioral outputs, moving beyond simple benchmark averages.

AIBearisharXiv – CS AI · Mar 37/108

🧠

The Global Landscape of Environmental AI Regulation: From the Cost of Reasoning to a Right to Green AI

A research paper reveals that generative AI systems deployed in 2025 have significantly higher environmental costs than previous AI generations, while current global regulations inadequately address these impacts. The authors propose mandatory model-level transparency, user opt-out rights, and international coordination to address environmental concerns in AI deployment.

AIBullisharXiv – CS AI · Mar 36/107

🧠

M3-AD: Reflection-aware Multi-modal, Multi-category, and Multi-dimensional Benchmark and Framework for Industrial Anomaly Detection

Researchers propose M3-AD, a new reflection-aware multimodal framework that improves industrial anomaly detection using large language models. The system includes RA-Monitor technology that enables AI models to self-correct unreliable decisions, outperforming existing open-source and commercial models in zero-shot anomaly detection tasks.

AIBullisharXiv – CS AI · Mar 37/106

🧠

Expert Divergence Learning for MoE-based Language Models

Researchers introduce Expert Divergence Learning, a new pre-training strategy for Mixture-of-Experts language models that prevents expert homogenization by encouraging functional specialization. The method uses domain labels to maximize routing distribution differences between data domains, achieving better performance on 15 billion parameter models with minimal computational overhead.

AINeutralarXiv – CS AI · Mar 37/107

🧠

What Is the Geometry of the Alignment Tax?

Researchers present a formal geometric theory for quantifying the alignment tax - the tradeoff between AI safety and capability performance. They derive mathematical frameworks showing how safety-capability conflicts can be measured using angles between representation subspaces and provide scaling laws for how these tradeoffs evolve with model size.

AIBullisharXiv – CS AI · Mar 36/107

🧠

REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective

Researchers propose REMIND, a framework for medical multi-modal AI learning that addresses the challenge of missing data across multiple modalities. The solution uses a Mixture-of-Experts architecture to handle long-tail distributions of modality combinations and shows superior performance on real-world medical datasets.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

Researchers introduce SupervisorAgent, a lightweight framework that reduces token consumption in Multi-Agent Systems by 29.68% while maintaining performance. The system provides real-time supervision and error correction without modifying base agent architectures, validated across multiple AI benchmarks.

AIBullisharXiv – CS AI · Mar 36/108

🧠

SurgFusion-Net: Diversified Adaptive Multimodal Fusion Network for Surgical Skill Assessment

Researchers developed SurgFusion-Net, a multimodal AI system for assessing surgical skills in robotic-assisted surgery. The system introduces new clinical datasets and fusion techniques that outperform existing baselines, addressing the domain gap between simulation and real clinical environments.

AIBullisharXiv – CS AI · Mar 37/108

🧠

LitBench: A Graph-Centric Large Language Model Benchmarking Tool For Literature Tasks

Researchers have introduced LitBench, a new benchmarking tool designed to develop and evaluate domain-specific large language models for literature-related tasks. The tool uses graph-centric data curation to generate domain-specific literature sub-graphs and creates training datasets, with results showing small domain-specific LLMs achieving competitive performance against state-of-the-art models like GPT-4o.

AINeutralarXiv – CS AI · Mar 37/106

🧠

MOSAIC: Unveiling the Moral, Social and Individual Dimensions of Large Language Models

Researchers introduce MOSAIC, the first comprehensive benchmark to evaluate moral, social, and individual characteristics of Large Language Models beyond traditional Moral Foundation Theory. The benchmark includes over 600 curated questions and scenarios from nine validated questionnaires and four platform-based games, providing empirical evidence that current evaluation methods are insufficient for assessing AI ethics comprehensively.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Breaking the Factorization Barrier in Diffusion Language Models

Researchers introduce Coupled Discrete Diffusion (CoDD), a breakthrough framework that solves the "factorization barrier" in diffusion language models by enabling parallel token generation without sacrificing coherence. The approach uses a lightweight probabilistic inference layer to model complex joint dependencies while maintaining computational efficiency.

AIBullisharXiv – CS AI · Mar 37/107

🧠

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Researchers introduce CARE, a new framework for improving LLM evaluation by addressing correlated errors in AI judge ensembles. The method separates true quality signals from confounding factors like verbosity and style preferences, achieving up to 26.8% error reduction across 12 benchmarks.

AINeutralarXiv – CS AI · Mar 37/106

🧠

StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser

Researchers introduce StaTS, a new diffusion model for time series forecasting that learns adaptive noise schedules and uses frequency-guided denoising. The model addresses limitations of fixed noise schedules in existing diffusion models by incorporating spectral regularization and data-adaptive scheduling for improved structural preservation.

$NEAR

AIBullisharXiv – CS AI · Mar 37/107

🧠

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

Researchers introduce Attn-QAT, the first systematic approach to 4-bit quantization-aware training for attention mechanisms in AI models. The method enables stable FP4 computation on emerging GPUs and delivers up to 1.5x speedup on RTX 5090 while maintaining model quality across diffusion and language models.

← PrevPage 561 of 859Next →