🧠

AI

21,472 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21472 articles

AIBullisharXiv – CS AI · Mar 27/1025

🧠

Capabilities Ain't All You Need: Measuring Propensities in AI

Researchers introduce the first formal framework for measuring AI propensities - the tendencies of models to exhibit particular behaviors - going beyond traditional capability measurements. The new bilogistic approach successfully predicts AI behavior on held-out tasks and shows stronger predictive power when combining propensities with capabilities than using either measure alone.

AIBullisharXiv – CS AI · Mar 26/1020

🧠

Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis

Researchers introduced Resp-Agent, an AI system that uses multimodal deep learning to generate respiratory sounds and diagnose diseases. The system addresses data scarcity and representation gaps in medical AI through an autonomous agent-based approach and includes a new benchmark dataset of 229k recordings.

$CA

AINeutralarXiv – CS AI · Mar 27/1017

🧠

RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis

Researchers introduce RooflineBench, a framework for measuring performance capabilities of Small Language Models on edge devices using operational intensity analysis. The study reveals that sequence length significantly impacts performance, model depth causes efficiency regression, and structural improvements like Multi-head Latent Attention can unlock better hardware utilization.

AINeutralarXiv – CS AI · Mar 27/1019

🧠

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

Researchers have developed an automated pipeline to detect hidden biases in Large Language Models that don't appear in their reasoning explanations. The system discovered previously unknown biases like Spanish fluency and writing formality across seven LLMs in hiring, loan approval, and university admission tasks.

AIBullisharXiv – CS AI · Mar 26/1014

🧠

SleepLM: Natural-Language Intelligence for Human Sleep

Researchers have developed SleepLM, a family of AI foundation models that combine natural language processing with sleep analysis using polysomnography data. The system can interpret and describe sleep patterns in natural language, trained on over 100K hours of sleep data from 10,000+ individuals, enabling new capabilities like language-guided sleep event detection and zero-shot generalization to novel sleep analysis tasks.

AIBullisharXiv – CS AI · Mar 27/1016

🧠

SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer

Researchers developed Score Matched Actor-Critic (SMAC), a new offline reinforcement learning method that enables smooth transition to online RL algorithms without performance drops. SMAC achieved successful transfer in all 6 D4RL tasks tested and reduced regret by 34-58% in 4 of 6 environments compared to best baselines.

AIBearisharXiv – CS AI · Mar 27/1019

🧠

Beyond Accuracy: Risk-Sensitive Evaluation of Hallucinated Medical Advice

Researchers propose a new risk-sensitive framework for evaluating AI hallucinations in medical advice that considers potential harm rather than just factual accuracy. The study reveals that AI models with similar performance show vastly different risk profiles when generating medical recommendations, highlighting critical safety gaps in current evaluation methods.

AIBullisharXiv – CS AI · Mar 26/1020

🧠

DECO: Decoupled Multimodal Diffusion Transformer for Bimanual Dexterous Manipulation with a Plugin Tactile Adapter

Researchers developed DECO, a multimodal diffusion transformer for bimanual robot manipulation that integrates vision, proprioception, and tactile signals. The system achieved 72.25% success rate on complex manipulation tasks, with a 21% improvement over baseline methods when tested on over 2,000 robot rollouts.

AIBullisharXiv – CS AI · Mar 26/1018

🧠

LIA: Supervised Fine-Tuning of Large Language Models for Automatic Issue Assignment

Researchers developed LIA, a supervised fine-tuning approach using DeepSeek-R1-Distill-Llama-8B to automatically assign software issues to developers. The system achieved up to 187.8% improvement over the base model and 211.2% better performance than existing methods in developer recommendation accuracy.

AIBullisharXiv – CS AI · Mar 26/1014

🧠

Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Researchers propose Trust Region Masking (TRM) to address off-policy mismatch problems in Large Language Model reinforcement learning pipelines. The method provides the first non-vacuous monotonic improvement guarantees for long-horizon LLM-RL tasks by masking entire sequences that violate trust region constraints.

AIBullisharXiv – CS AI · Mar 26/1017

🧠

VISTA: Knowledge-Driven Vessel Trajectory Imputation with Repair Provenance

Researchers introduce VISTA, a framework for vessel trajectory imputation that uses knowledge-driven LLM reasoning to repair incomplete maritime tracking data. The system provides 'repair provenance' - documented reasoning behind data repairs - achieving 5-91% accuracy improvements over existing methods while reducing inference time by 51-93%.

AIBullisharXiv – CS AI · Mar 27/1019

🧠

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

Researchers propose Generalized Primal Averaging (GPA), a new optimization method that improves training speed for large language models by 8-10% over standard AdamW while using less memory. GPA unifies and enhances existing averaging-based optimizers like DiLoCo by enabling smooth iterate averaging at every step without complex two-loop structures.

AIBullisharXiv – CS AI · Mar 26/1014

🧠

GenAI-Net: A Generative AI Framework for Automated Biomolecular Network Design

Researchers have developed GenAI-Net, a generative AI framework that automates the design of chemical reaction networks (CRNs) for synthetic biology applications. The system can automatically generate biomolecular circuits for various functions including logic gates, oscillators, and classifiers, potentially accelerating the development of biomanufacturing and therapeutic technologies.

AIBearisharXiv – CS AI · Mar 26/1018

🧠

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models

Researchers introduce FRIEDA, a new benchmark for testing cartographic reasoning in large vision-language models, revealing significant limitations. The best AI models achieve only 37-38% accuracy compared to 84.87% human performance on complex map interpretation tasks requiring multi-step spatial reasoning.

AIBullisharXiv – CS AI · Mar 26/1014

🧠

WisPaper: Your AI Scholar Search Engine

WisPaper is a new AI-powered academic search system that combines semantic search capabilities with automated paper validation and organization tools. The system achieved 22.26% recall on TaxoBench and 93.70% validation accuracy, addressing key limitations in current academic search engines by integrating discovery, organization, and monitoring workflows.

AIBullisharXiv – CS AI · Mar 27/1019

🧠

SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation

Researchers developed SocialNav, a foundation model for socially-aware robot navigation that uses a hierarchical architecture to understand social norms and generate compliant movement paths. The model was trained on 7 million samples and achieved 38% better success rates and 46% improved social compliance compared to existing methods.

AINeutralarXiv – CS AI · Mar 27/1023

🧠

SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios

Researchers introduce SWITCH, a new benchmark for testing autonomous AI agents' ability to interact with physical interfaces like switches and appliance panels in real-world scenarios. The benchmark reveals significant gaps in current AI models' capabilities for long-horizon tasks requiring causal reasoning and verification.

AIBullisharXiv – CS AI · Mar 27/1019

🧠

VCWorld: A Biological World Model for Virtual Cell Simulation

Researchers have developed VCWorld, a new AI-powered biological simulation system that combines large language models with structured biological knowledge to predict cellular responses to drug perturbations. The system operates as a 'white-box' model, providing interpretable predictions and mechanistic insights while achieving state-of-the-art performance in drug perturbation benchmarks.

AIBullisharXiv – CS AI · Mar 27/1016

🧠

DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone

Researchers introduce DiffuMamba, a new diffusion language model using Mamba backbone architecture that achieves up to 8.2x higher inference throughput than Transformer-based models while maintaining comparable performance. The model demonstrates linear scaling with sequence length and represents a significant advancement in efficient AI text generation systems.

AIBullisharXiv – CS AI · Mar 27/1024

🧠

DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher

Researchers propose DUET, a new distillation-based method for LLM unlearning that removes undesirable knowledge from AI models without full retraining. The technique combines computational efficiency with security advantages, achieving better performance in both knowledge removal and utility preservation while being significantly more data-efficient than existing methods.

AIBullisharXiv – CS AI · Mar 26/1018

🧠

QKAN-LSTM: Quantum-inspired Kolmogorov-Arnold Long Short-term Memory

Researchers propose QKAN-LSTM, a quantum-inspired neural network that integrates quantum variational activation functions into LSTM architecture for sequential modeling. The model achieves superior predictive accuracy with 79% fewer parameters than classical LSTMs while remaining executable on classical hardware.

AINeutralarXiv – CS AI · Mar 27/1018

🧠

Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

Researchers analyzed how large language models express moral judgments when prompted to role-play different personas. The study found that Claude models are most morally robust, while larger models within families tend to be more susceptible to moral shifts through persona conditioning.

AINeutralarXiv – CS AI · Mar 27/1022

🧠

Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

Researchers developed an offline-to-online reinforcement learning framework that improves robot control robustness through adversarial fine-tuning. The method trains policies on clean datasets then applies action perturbations during fine-tuning to build resilience against actuator faults and environmental uncertainties.

AIBullisharXiv – CS AI · Mar 27/1019

🧠

Thompson Sampling via Fine-Tuning of LLMs

Researchers developed ToSFiT (Thompson Sampling via Fine-Tuning), a new Bayesian optimization method that uses fine-tuned large language models to improve search efficiency in complex discrete spaces. The approach eliminates computational bottlenecks by directly parameterizing reward probabilities and demonstrates superior performance across diverse applications including protein search and quantum circuit design.

AIBullisharXiv – CS AI · Mar 26/1021

🧠

Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation

Researchers developed Speculative Verdict (SV), a training-free framework that improves large Vision-Language Models' ability to reason over information-dense images by combining multiple small draft models with a larger verdict model. The approach achieves better accuracy on visual question answering benchmarks while reducing computational costs compared to large proprietary models.

← PrevPage 567 of 859Next →