🧠

AI

11,686 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

11686 articles

AIBullisharXiv – CS AI · Mar 37/104

🧠

BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints

BinaryShield is the first privacy-preserving threat intelligence system that enables secure sharing of attack fingerprints across compliance boundaries for LLM services. The system addresses the critical security gap where organizations cannot share prompt injection attack intelligence between services due to privacy regulations, achieving an F1-score of 0.94 while providing 38x faster similarity search than dense embeddings.

AIBullisharXiv – CS AI · Mar 37/103

🧠

Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning

Researchers developed LA-CDM, a language agent that uses reinforcement learning to support clinical decision-making by iteratively requesting tests and generating hypotheses for diagnosis. The system was trained using a hybrid approach combining supervised and reinforcement learning, and tested on real-world data covering four abdominal diseases.

AIBearisharXiv – CS AI · Mar 37/104

🧠

VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents

Researchers have identified critical security vulnerabilities in Computer-Use Agents (CUAs) through Visual Prompt Injection attacks, where malicious instructions are embedded in user interfaces. Their VPI-Bench study shows CUAs can be deceived at rates up to 51% and Browser-Use Agents up to 100% on certain platforms, with current defenses proving inadequate.

AIBullisharXiv – CS AI · Mar 37/103

🧠

SageBwd: A Trainable Low-bit Attention

Researchers have developed SageBwd, a trainable INT8 attention mechanism that can match full-precision attention performance during pre-training while quantizing six of seven attention matrix multiplications. The study identifies key factors for stable training including QK-norm requirements and the impact of tokens per step on quantization errors.

AIBullisharXiv – CS AI · Mar 37/103

🧠

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Researchers introduce Robometer, a new framework for training robot reward models that combines progress tracking with trajectory comparisons to better learn from failed attempts. The system is trained on RBM-1M, a dataset of over one million robot trajectories including failures, and shows improved performance across diverse robotics applications.

AINeutralarXiv – CS AI · Mar 37/104

🧠

Revealing Combinatorial Reasoning of GNNs via Graph Concept Bottleneck Layer

Researchers developed a new graph concept bottleneck layer (GCBM) that can be integrated into Graph Neural Networks to make their decision-making process more interpretable. The method treats graph concepts as 'words' and uses language models to improve understanding of how GNNs make predictions, achieving state-of-the-art performance in both classification accuracy and interpretability.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Neuro-Symbolic Skill Discovery for Conditional Multi-Level Planning

Researchers have developed a new AI architecture that learns high-level symbolic skills from minimal low-level demonstrations, enabling robots to manipulate objects and execute complex tasks in unseen environments. The system combines neural networks for symbol discovery with visual language models for high-level planning and gradient-based methods for low-level execution.

AIBullisharXiv – CS AI · Mar 37/103

🧠

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Researchers propose GenDB, a revolutionary database system that uses Large Language Models to synthesize query execution code instead of relying on traditional engineered query processors. Early prototype testing shows GenDB outperforms established systems like DuckDB, Umbra, and PostgreSQL on OLAP workloads.

AIBullisharXiv – CS AI · Mar 37/102

🧠

Sparse Shift Autoencoders for Identifying Concepts from Large Language Model Activations

Researchers introduce Sparse Shift Autoencoders (SSAEs), a new method for improving large language model interpretability by learning sparse representations of differences between embeddings rather than the embeddings themselves. This approach addresses the identifiability problem in current sparse autoencoder techniques, potentially enabling more precise control over specific AI behaviors without unintended side effects.

AINeutralarXiv – CS AI · Mar 37/103

🧠

FSW-GNN: A Bi-Lipschitz WL-Equivalent Graph Neural Network

Researchers introduce FSW-GNN, the first Message Passing Neural Network that is fully bi-Lipschitz with respect to standard WL-equivalent graph metrics. This addresses the limitation where standard MPNNs produce poorly distinguishable outputs for separable graphs, with empirical results showing competitive performance and superior accuracy in long-range tasks.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Learning from Synthetic Data Improves Multi-hop Reasoning

Researchers demonstrated that large language models can improve multi-hop reasoning performance by training on rule-generated synthetic data instead of expensive human annotations or frontier LLM outputs. The study found that LLMs trained on synthetic fictional data performed better on real-world question-answering benchmarks by learning fundamental knowledge composition skills.

AINeutralarXiv – CS AI · Mar 37/104

🧠

GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

Researchers introduce GLEE, a new framework for studying how Large Language Models behave in economic games and strategic interactions. The study reveals that LLM performance in economic scenarios depends heavily on market parameters and model selection, with complex interdependent effects on outcomes.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Dense-Jump Flow Matching with Non-Uniform Time Scheduling for Robotic Policies: Mitigating Multi-Step Inference Degradation

Researchers developed a new robotic policy framework using dense-jump flow matching with non-uniform time scheduling to address performance degradation in multi-step inference. The approach achieves up to 23.7% performance gains over existing baselines by optimizing integration scheduling during training and inference phases.

AIBullisharXiv – CS AI · Mar 37/103

🧠

Beyond Frame-wise Tracking: A Trajectory-based Paradigm for Efficient Point Cloud Tracking

Researchers have developed TrajTrack, a new AI framework for 3D object tracking in LiDAR systems that achieves state-of-the-art performance while running at 55 FPS. The system improves tracking precision by 3.02% over existing methods by using historical trajectory data rather than computationally expensive multi-frame point cloud processing.

AINeutralarXiv – CS AI · Mar 37/104

🧠

Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping

Researchers introduce Interaction2Code, the first benchmark for evaluating Multimodal Large Language Models' ability to generate interactive webpage code from prototypes. The study identifies four critical limitations in current MLLMs and proposes enhancement strategies to improve their performance on dynamic web interactions.

AIBullisharXiv – CS AI · Mar 37/102

🧠

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Researchers propose Intervened Preference Optimization (IPO) to address safety issues in Large Reasoning Models, where chain-of-thought reasoning contains harmful content even when final responses appear safe. The method achieves over 30% reduction in harmfulness while maintaining reasoning performance.

AINeutralarXiv – CS AI · Mar 37/103

🧠

On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective

Researchers prove that gradient descent in neural networks converges to optimal robustness margins at an extremely slow rate of Θ(1/ln(t)), even in simplified two-neuron settings. This establishes the first explicit lower bound on convergence rates for robustness margins in non-linear models, revealing fundamental limitations in neural network training efficiency.

AIBullisharXiv – CS AI · Mar 37/104

🧠

EnterpriseBench Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

Surge AI introduces CoreCraft, the first environment in EnterpriseBench for training AI agents on realistic enterprise workflows. Training GLM 4.6 on this high-fidelity customer support simulation improved task performance from 25% to 37% and showed positive transfer to other benchmarks, demonstrating that quality training environments enable generalizable AI capabilities.

AIBullisharXiv – CS AI · Mar 37/105

🧠

Arbor: A Framework for Reliable Navigation of Critical Conversation Flows

Researchers introduce Arbor, a framework that decomposes large language model decision-making into specialized node-level tasks for critical applications like healthcare triage. The system improves accuracy by 29.4 percentage points while reducing latency by 57.1% and costs by 14.4x compared to single-prompt approaches.

AIBullisharXiv – CS AI · Mar 37/103

🧠

MSP-LLM: A Unified Large Language Model Framework for Complete Material Synthesis Planning

Researchers have developed MSP-LLM, a unified large language model framework for complete material synthesis planning that addresses both precursor prediction and synthesis operation prediction. The system outperforms existing methods by breaking down the complex task into structured subproblems with chemical consistency.

AIBullisharXiv – CS AI · Mar 37/103

🧠

ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms

Researchers developed ZeroDVFS, a system that uses Large Language Models to optimize power management in embedded systems without requiring extensive profiling. The system achieves 7.09 times better energy efficiency and enables zero-shot deployment for new workloads in under 5 seconds through LLM-based code analysis.

AIBearisharXiv – CS AI · Mar 37/103

🧠

ERIS: Evolutionary Real-world Interference Scheme for Jailbreaking Audio Large Models

Researchers developed ERIS, a new framework that uses genetic algorithms to exploit Audio Large Models (ALMs) by disguising malicious instructions as natural speech with background noise. The system can bypass safety filters by embedding harmful content in real-world audio interference that appears harmless to humans and security systems.

AIBullisharXiv – CS AI · Mar 37/105

🧠

Long-Context Generalization with Sparse Attention

Researchers introduce ASEntmax, a new attention mechanism for transformer models that uses sparse attention with learnable temperature parameters. This approach significantly outperforms traditional softmax attention, achieving up to 1000x length extrapolation on synthetic tasks and better long-context performance in language modeling.

AIBullisharXiv – CS AI · Mar 37/103

🧠

MagicAgent: Towards Generalized Agent Planning

Researchers have developed MagicAgent, a series of foundation models designed for generalized AI agent planning that outperforms existing sub-100B models and even surpasses leading ultra-scale models like GPT-5.2. The models achieve superior performance through a novel synthetic data framework and two-stage training paradigm that addresses gradient interference in multi-task learning.

AIBullisharXiv – CS AI · Mar 37/103

🧠

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

Researchers introduce MAS-Orchestra, a new framework for multi-agent AI systems that uses reinforcement learning to orchestrate multiple AI agents more efficiently. The system achieves 10x efficiency improvements over existing methods and includes a benchmark (MASBENCH) to better understand when multi-agent systems outperform single-agent approaches.

← PrevPage 78 of 468Next →