🧠

AI

11,680 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

11680 articles

AIBullisharXiv – CS AI · Mar 47/103

🧠

Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs

Researchers developed a training method for large-scale Mixture-of-Experts (MoE) models using FP4 precision on Hopper GPUs without native 4-bit support. The technique achieves 14.8% memory reduction and 12.5% throughput improvement for 671B parameter models by using FP4 for activations while keeping core computations in FP8.

AINeutralarXiv – CS AI · Mar 46/102

🧠

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Researchers introduce SteerEval, a new benchmark for evaluating how controllable Large Language Models are across language features, sentiment, and personality domains. The study reveals that current steering methods often fail at finer-grained control levels, highlighting significant risks when deploying LLMs in socially sensitive applications.

AIBullisharXiv – CS AI · Mar 46/102

🧠

GPUTOK: GPU Accelerated Byte Level BPE Tokenization

Researchers developed GPUTOK, a GPU-accelerated tokenizer for large language models that processes text significantly faster than existing CPU-based solutions. The optimized version shows 1.7x speed improvement over tiktoken and 7.6x over HuggingFace's GPT-2 tokenizer while maintaining output quality.

AIBullisharXiv – CS AI · Mar 47/103

🧠

CAPT: Confusion-Aware Prompt Tuning for Reducing Vision-Language Misalignment

Researchers propose CAPT, a Confusion-Aware Prompt Tuning framework that addresses systematic misclassifications in vision-language models like CLIP by learning from the model's own confusion patterns. The method uses a Confusion Bank to model persistent category misalignments and introduces specialized modules to capture both semantic and sample-level confusion cues.

AIBullisharXiv – CS AI · Mar 47/102

🧠

Efficient Agent Training for Computer Use

Researchers introduced PC Agent-E, an efficient AI agent training framework that achieves human-like computer use with minimal human demonstration data. Starting with just 312 human-annotated trajectories and augmenting them with Claude 3.7 Sonnet synthesis, the model achieved 141% relative improvement and outperformed Claude 3.7 Sonnet by 10% on WindowsAgentArena-V2 benchmark.

AIBullisharXiv – CS AI · Mar 46/104

🧠

Conditioned Activation Transport for T2I Safety Steering

Researchers introduce Conditioned Activation Transport (CAT), a new framework to prevent text-to-image AI models from generating unsafe content while preserving image quality for legitimate prompts. The method uses a geometry-based conditioning mechanism and nonlinear transport maps, validated on Z-Image and Infinity architectures with significantly reduced attack success rates.

AIBullisharXiv – CS AI · Mar 46/104

🧠

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

Researchers introduce MASPOB, a bandit-based framework that optimizes prompts for Multi-Agent Systems using Graph Neural Networks to handle topology-induced coupling. The system reduces search complexity from exponential to linear while achieving state-of-the-art performance across benchmarks.

AIBullisharXiv – CS AI · Mar 46/103

🧠

Robust Heterogeneous Analog-Digital Computing for Mixture-of-Experts Models with Theoretical Generalization Guarantees

Researchers propose a heterogeneous computing framework for Mixture-of-Experts AI models that combines analog in-memory computing with digital processing to improve energy efficiency. The approach identifies noise-sensitive experts for digital computation while running the majority on analog hardware, eliminating the need for costly retraining of large models.

AIBullisharXiv – CS AI · Mar 46/106

🧠

SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

SuperLocalMemory is a new privacy-preserving memory system for multi-agent AI that defends against memory poisoning attacks through local-first architecture and Bayesian trust scoring. The open-source system eliminates cloud dependencies while providing personalized retrieval through adaptive learning-to-rank, demonstrating strong performance metrics including 10.6ms search latency and 72% trust degradation for sleeper attacks.

AINeutralarXiv – CS AI · Mar 46/103

🧠

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Researchers released the ERI benchmark, a comprehensive dataset spanning 9 engineering fields and 55 subdomains to evaluate large language models' engineering capabilities. The benchmark tested 7 LLMs across 57,750 records, revealing a clear three-tier performance structure with frontier models like GPT-5 and Claude Sonnet 4 significantly outperforming mid-tier and smaller models.

AINeutralarXiv – CS AI · Mar 47/105

🧠

Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving

Researchers introduce Federated Inference (FI), a new collaborative paradigm where independently trained AI models can work together at inference time without sharing data or model parameters. The study identifies key requirements including privacy preservation and performance gains, while highlighting system-level challenges that differ from traditional federated learning approaches.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Researchers introduce Skywork-Reward-V2, a suite of AI reward models trained on SynPref-40M, a massive 40-million preference pair dataset created through human-AI collaboration. The models achieve state-of-the-art performance across seven major benchmarks by combining human annotation quality with AI scalability for better preference learning.

AIBullisharXiv – CS AI · Mar 47/103

🧠

OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

Researchers introduce OptMerge, a new benchmark and method for combining multiple expert Multimodal Large Language Models (MLLMs) into single, more capable models without requiring additional training data. The approach achieves 2.48% average performance gains while reducing storage and serving costs by merging models across different modalities like vision, audio, and video.

AIBullisharXiv – CS AI · Mar 46/103

🧠

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs

Researchers introduce VC-STaR, a new framework that improves visual reasoning in vision-language models by using contrastive image pairs to reduce hallucinations. The approach creates VisCoR-55K, a new dataset that outperforms existing visual reasoning methods when used for model fine-tuning.

AIBullisharXiv – CS AI · Mar 47/104

🧠

CoDAR: Continuous Diffusion Language Models are More Powerful Than You Think

Researchers propose CoDAR, a new continuous diffusion language model framework that addresses key bottlenecks in token rounding through a two-stage approach combining continuous diffusion with an autoregressive decoder. The model demonstrates substantial improvements in generation quality over existing latent diffusion methods and becomes competitive with discrete diffusion language models.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Learning Object-Centric Spatial Reasoning for Sequential Manipulation in Cluttered Environments

Researchers developed Unveiler, a robotic manipulation framework that uses object-centric spatial reasoning to retrieve items from cluttered environments. The system achieves up to 97.6% success in simulation by separating high-level spatial reasoning from low-level action execution, and demonstrates zero-shot transfer to real-world scenarios.

AINeutralarXiv – CS AI · Mar 46/103

🧠

What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty

Researchers prove 'selection theorems' showing that AI agents achieving low regret on prediction tasks must develop internal predictive models and belief states. The work demonstrates that structured internal representations are mathematically necessary, not just helpful, for competent decision-making under uncertainty.

AINeutralarXiv – CS AI · Mar 47/102

🧠

Credibility Governance: A Social Mechanism for Collective Self-Correction under Weak Truth Signals

Researchers propose Credibility Governance (CG), a new mechanism that improves collective decision-making on online platforms by dynamically scoring agent and opinion credibility based on alignment with emerging evidence. Testing in simulated environments shows CG outperforms traditional voting and stake-weighted systems, offering better resistance to misinformation and manipulation.

AIBullisharXiv – CS AI · Mar 47/102

🧠

Bridging Diffusion Guidance and Anderson Acceleration via Hopfield Dynamics

Researchers have developed Geometry Aware Attention Guidance (GAG), a new method that improves diffusion model generation quality by optimizing attention-space extrapolation. The approach models attention dynamics as fixed-point iterations within Modern Hopfield Networks and applies Anderson Acceleration to stabilize the process while reducing computational costs.

AINeutralarXiv – CS AI · Mar 46/105

🧠

Human-Certified Module Repositories for the AI Age

Researchers propose Human-Certified Module Repositories (HCMRs) as a new framework to ensure trustworthy software development in the AI era. The system combines human oversight with automated analysis to certify and curate reusable code modules, addressing growing security concerns as AI increasingly generates and assembles software components.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Dual Randomized Smoothing: Beyond Global Noise Variance

Researchers propose a dual Randomized Smoothing framework that overcomes limitations of standard neural network robustness certification by using input-dependent noise variances instead of global ones. The method achieves strong performance at both small and large radii with gains of 15-20% on CIFAR-10 and 8-17% on ImageNet, while adding only 60% computational overhead.

AIBullisharXiv – CS AI · Mar 46/104

🧠

xLLM Technical Report

xLLM is a new open-source Large Language Model inference framework that delivers significantly improved performance for enterprise AI deployments. The framework achieves 1.7-2.2x higher throughput compared to existing solutions like MindIE and vLLM-Ascend through novel architectural optimizations including decoupled service-engine design and intelligent scheduling.

AINeutralarXiv – CS AI · Mar 47/104

🧠

Toward a Dynamic Stackelberg Game-Theoretic Framework for Agentic AI Defense Against LLM Jailbreaking

Researchers propose a game-theoretic framework using Stackelberg equilibrium and Rapidly exploring Random Trees to model interactions between attackers trying to jailbreak LLMs and defensive AI systems. The framework provides a mathematical foundation for understanding and improving AI safety guardrails against prompt-based attacks.

AIBullisharXiv – CS AI · Mar 47/102

🧠

Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States

Researchers developed a new channel-adaptive AI algorithm that maximizes inference throughput in 6G edge computing networks by dynamically adjusting computational complexity based on channel conditions. The system uses integrated communication and computation (IC²) to optimize both feature compression and model complexity for mobile edge inference.

AIBullisharXiv – CS AI · Mar 46/104

🧠

Large Electron Model: A Universal Ground State Predictor

Researchers introduce Large Electron Model, a neural network that uses Fermi Sets architecture to predict ground state wavefunctions of interacting electrons across different Hamiltonian parameters. The model demonstrates accurate predictions for up to 50 particles and generalizes across unseen coupling strengths, potentially advancing material discovery beyond density functional theory limitations.

← PrevPage 73 of 468Next →