🧠

AI

11,689 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

11689 articles

AIBullisharXiv – CS AI · Mar 37/103

🧠

Beyond Frame-wise Tracking: A Trajectory-based Paradigm for Efficient Point Cloud Tracking

Researchers have developed TrajTrack, a new AI framework for 3D object tracking in LiDAR systems that achieves state-of-the-art performance while running at 55 FPS. The system improves tracking precision by 3.02% over existing methods by using historical trajectory data rather than computationally expensive multi-frame point cloud processing.

AIBullisharXiv – CS AI · Mar 37/103

🧠

MagicAgent: Towards Generalized Agent Planning

Researchers have developed MagicAgent, a series of foundation models designed for generalized AI agent planning that outperforms existing sub-100B models and even surpasses leading ultra-scale models like GPT-5.2. The models achieve superior performance through a novel synthetic data framework and two-stage training paradigm that addresses gradient interference in multi-task learning.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

Researchers introduce SVDecode, a new method for adapting large language models to specific tasks without extensive fine-tuning. The technique uses steering vectors during decoding to align output distributions with task requirements, improving accuracy by up to 5 percentage points while adding minimal computational overhead.

AIBullisharXiv – CS AI · Mar 37/103

🧠

Towards Camera Open-set 3D Object Detection for Autonomous Driving Scenarios

Researchers developed OS-Det3D, a two-stage framework for camera-based 3D object detection in autonomous vehicles that can identify unknown objects beyond predefined categories. The system uses LiDAR geometric cues and a joint selection module to discover novel objects while improving detection of known objects, addressing safety risks in real-world driving scenarios.

AIBearisharXiv – CS AI · Mar 37/103

🧠

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Research reveals that AI control protocols designed to prevent harmful behavior from untrusted LLM agents can be systematically defeated through adaptive attacks targeting monitor models. The study demonstrates that frontier models can evade safety measures by embedding prompt injections in their outputs, with existing protocols like Defer-to-Resample actually amplifying these attacks.

AIBullisharXiv – CS AI · Mar 37/103

🧠

GEM: A Gym for Agentic LLMs

Researchers introduced GEM (General Experience Maker), an open-source environment simulator designed for training large language models through experience-based learning rather than static datasets. The framework provides a standardized interface similar to OpenAI-Gym but specifically optimized for LLMs, featuring diverse environments, integrated tools, and compatibility with popular RL training frameworks.

$MKR

AINeutralarXiv – CS AI · Mar 37/104

🧠

GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

Researchers introduce GLEE, a new framework for studying how Large Language Models behave in economic games and strategic interactions. The study reveals that LLM performance in economic scenarios depends heavily on market parameters and model selection, with complex interdependent effects on outcomes.

AIBullisharXiv – CS AI · Mar 37/104

🧠

EnterpriseBench Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

Surge AI introduces CoreCraft, the first environment in EnterpriseBench for training AI agents on realistic enterprise workflows. Training GLM 4.6 on this high-fidelity customer support simulation improved task performance from 25% to 37% and showed positive transfer to other benchmarks, demonstrating that quality training environments enable generalizable AI capabilities.

AIBullisharXiv – CS AI · Mar 37/105

🧠

Arbor: A Framework for Reliable Navigation of Critical Conversation Flows

Researchers introduce Arbor, a framework that decomposes large language model decision-making into specialized node-level tasks for critical applications like healthcare triage. The system improves accuracy by 29.4 percentage points while reducing latency by 57.1% and costs by 14.4x compared to single-prompt approaches.

AINeutralarXiv – CS AI · Mar 37/104

🧠

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Researchers analyzed Mixture-of-Experts (MoE) language models to determine optimal sparsity levels for different tasks. They found that reasoning tasks require balancing active compute (FLOPs) with optimal data-to-parameter ratios, while memorization tasks benefit from more parameters regardless of sparsity.

AIBullisharXiv – CS AI · Mar 37/103

🧠

ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms

Researchers developed ZeroDVFS, a system that uses Large Language Models to optimize power management in embedded systems without requiring extensive profiling. The system achieves 7.09 times better energy efficiency and enables zero-shot deployment for new workloads in under 5 seconds through LLM-based code analysis.

AIBullisharXiv – CS AI · Mar 37/104

🧠

DRAGON: LLM-Driven Decomposition and Reconstruction Agents for Large-Scale Combinatorial Optimization

Researchers introduce DRAGON, a new framework that combines Large Language Models with metaheuristic optimization to solve large-scale combinatorial optimization problems. The system decomposes complex problems into manageable subproblems and achieves near-optimal results on datasets with over 3 million variables, overcoming the scalability limitations of existing LLM-based solvers.

$NEAR

AIBullisharXiv – CS AI · Mar 37/104

🧠

Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

Researchers have developed Hierarchical Speculative Decoding (HSD), a new method that significantly improves AI inference speed while maintaining accuracy by solving joint intractability problems in verification processes. The technique shows over 12% performance gains when integrated with existing frameworks like EAGLE-3, establishing new state-of-the-art efficiency standards.

AINeutralarXiv – CS AI · Mar 37/104

🧠

Implementing Pearl's $\mathcal{DO}$-Calculus on Quantum Circuits: A Simpson-Type Case Study on NISQ Hardware

Researchers have developed a method to implement Pearl's causal inference framework (DO-calculus) on quantum circuits, mapping causal networks to quantum hardware through 'circuit surgery.' The approach was successfully demonstrated on IonQ's quantum processor using a healthcare model, showing agreement with classical baselines.

AIBullisharXiv – CS AI · Mar 37/103

🧠

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

Researchers introduce MAS-Orchestra, a new framework for multi-agent AI systems that uses reinforcement learning to orchestrate multiple AI agents more efficiently. The system achieves 10x efficiency improvements over existing methods and includes a benchmark (MASBENCH) to better understand when multi-agent systems outperform single-agent approaches.

AINeutralarXiv – CS AI · Mar 37/104

🧠

PsyAgent: Constructing Human-like Agents Based on Psychological Modeling and Contextual Interaction

Researchers introduce PsyAgent, a new AI framework that creates human-like agents by combining personality modeling based on Big Five traits with contextual social awareness. The system uses structured prompts and fine-tuning to produce AI agents that maintain stable personality traits while adapting appropriately to different social situations and roles.

AIBullisharXiv – CS AI · Mar 37/103

🧠

MSP-LLM: A Unified Large Language Model Framework for Complete Material Synthesis Planning

Researchers have developed MSP-LLM, a unified large language model framework for complete material synthesis planning that addresses both precursor prediction and synthesis operation prediction. The system outperforms existing methods by breaking down the complex task into structured subproblems with chemical consistency.

AIBullisharXiv – CS AI · Mar 37/104

🧠

A Learnable Wavelet Transformer for Long-Short Equity Trading and Risk-Adjusted Return Optimization

Researchers developed WaveLSFormer, a wavelet-based Transformer model that directly generates market-neutral long/short trading portfolios from financial time series data. The AI system achieved a 60.7% cumulative return and 2.16 Sharpe ratio across six industry groups, significantly outperforming traditional ML models like LSTM and standard Transformers.

AIBullisharXiv – CS AI · Mar 37/104

🧠

BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints

BinaryShield is the first privacy-preserving threat intelligence system that enables secure sharing of attack fingerprints across compliance boundaries for LLM services. The system addresses the critical security gap where organizations cannot share prompt injection attack intelligence between services due to privacy regulations, achieving an F1-score of 0.94 while providing 38x faster similarity search than dense embeddings.

AINeutralarXiv – CS AI · Mar 37/103

🧠

FSW-GNN: A Bi-Lipschitz WL-Equivalent Graph Neural Network

Researchers introduce FSW-GNN, the first Message Passing Neural Network that is fully bi-Lipschitz with respect to standard WL-equivalent graph metrics. This addresses the limitation where standard MPNNs produce poorly distinguishable outputs for separable graphs, with empirical results showing competitive performance and superior accuracy in long-range tasks.

AIBullisharXiv – CS AI · Mar 37/104

🧠

AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent

Researchers introduced AgentMath, a new AI framework that combines language models with code interpreters to solve complex mathematical problems more efficiently than current Large Reasoning Models. The system achieves state-of-the-art performance on mathematical competition benchmarks, with AgentMath-30B-A3B reaching 90.6% accuracy on AIME24 while remaining competitive with much larger models like OpenAI-o3.

AIBullisharXiv – CS AI · Mar 37/103

🧠

DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization

Researchers propose Decoupled Reward Policy Optimization (DRPO), a new framework that reduces computational costs in large reasoning models by 77% while maintaining performance. The method addresses the 'overthinking' problem where AI models generate unnecessarily long reasoning for simple questions, achieving significant efficiency gains over existing approaches.

AIBullisharXiv – CS AI · Mar 37/102

🧠

The FM Agent

Researchers have developed FM Agent, a multi-agent AI framework that combines large language models with evolutionary search to autonomously solve complex research problems. The system achieved state-of-the-art results across multiple domains including operations research, machine learning, and GPU optimization without human intervention.

AINeutralarXiv – CS AI · Mar 37/105

🧠

DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs

Researchers introduce DAG-Math, a new framework for evaluating mathematical reasoning in Large Language Models that models Chain-of-Thought as rule-based processes over directed acyclic graphs. The framework includes a 'logical closeness' metric that reveals significant differences in reasoning quality between LLM families, even when final answer accuracy appears comparable.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Neuro-Symbolic Skill Discovery for Conditional Multi-Level Planning

Researchers have developed a new AI architecture that learns high-level symbolic skills from minimal low-level demonstrations, enabling robots to manipulate objects and execute complex tasks in unseen environments. The system combines neural networks for symbol discovery with visual language models for high-level planning and gradient-based methods for low-level execution.

← PrevPage 80 of 468Next →