🧠

AI

12,860 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12860 articles

AIBullisharXiv – CS AI · Mar 176/10

🧠

Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

Researchers propose CroBo, a new visual state representation learning framework that helps robotic agents better understand dynamic environments by encoding both semantic identities and spatial locations of scene elements. The framework uses a global-to-local reconstruction method that compresses observations into compact tokens, achieving state-of-the-art performance on robot policy learning benchmarks.

AIBullisharXiv – CS AI · Mar 176/10

🧠

LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement

Researchers have developed a new audio-visual speech enhancement framework that uses Large Language Models and reinforcement learning to improve speech quality. The method outperforms existing baselines by using LLM-generated natural language feedback as rewards for model training, providing more interpretable optimization compared to traditional scalar metrics.

AIBullisharXiv – CS AI · Mar 176/10

🧠

UVLM: A Universal Vision-Language Model Loader for Reproducible Multimodal Benchmarking

Researchers have introduced UVLM (Universal Vision-Language Model Loader), a Google Colab-based framework that provides a unified interface for loading, configuring, and benchmarking multiple Vision-Language Model architectures. The framework currently supports LLaVA-NeXT and Qwen2.5-VL models and enables researchers to compare different VLMs using identical evaluation protocols on custom image analysis tasks.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Is Seeing Believing? Evaluating Human Sensitivity to Synthetic Video

Research reveals that humans can detect credibility issues in deepfake videos through visual and audio distortions. Three experiments show that both technical artifacts and distortions in synthetic media reduce perceived credibility, though understanding of human perception of deepfakes remains limited.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Collapse or Preserve: Data-Dependent Temporal Aggregation for Spiking Neural Network Acceleration

Researchers developed Temporal Aggregated Convolution (TAC) to accelerate spiking neural networks by aggregating spike frames before convolution, achieving 13.8x speedup on rate-coded data. The study reveals that optimal temporal aggregation strategies depend on data type - collapsing temporal dimensions for rate-coded data while preserving them for event-based data.

🏢 Nvidia

AINeutralarXiv – CS AI · Mar 176/10

🧠

Deeper Thought, Weaker Aim: Understanding and Mitigating Perceptual Impairment during Reasoning in Multimodal Large Language Models

Researchers have identified that multimodal large language models (MLLMs) lose visual focus during complex reasoning tasks, with attention becoming scattered across images rather than staying on relevant regions. They propose a training-free Visual Region-Guided Attention (VRGA) framework that improves visual grounding and reasoning accuracy by reweighting attention to question-relevant areas.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Knowledge Distillation for Large Language Models

Researchers developed a resource-efficient framework for compressing large language models using knowledge distillation and chain-of-thought reinforcement learning. The method successfully compressed Qwen 3B to 0.5B while retaining 70-95% of performance across English, Spanish, and coding tasks, making AI models more suitable for resource-constrained deployments.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion

Researchers developed a framework to make large language model-based query expansion more efficient by distilling knowledge from powerful teacher models into compact student models. The approach uses retrieval feedback and preference alignment to maintain 97% of the original performance while dramatically reducing inference costs.

AIBullisharXiv – CS AI · Mar 176/10

🧠

REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

Researchers developed REFINE-DP, a hierarchical framework that combines diffusion policies with reinforcement learning to enable humanoid robots to perform complex loco-manipulation tasks. The system achieves over 90% success rate in simulation and demonstrates smooth autonomous execution in real-world environments for tasks like door traversal and object transport.

AINeutralarXiv – CS AI · Mar 176/10

🧠

QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models

Researchers introduced QuarkMedBench, a new benchmark for evaluating large language models on real-world medical queries using over 20,000 queries across clinical care scenarios. The benchmark addresses limitations of current medical AI evaluations that rely on multiple-choice questions by using an automated scoring framework that achieves 91.8% concordance with clinical expert assessments.

AIBullisharXiv – CS AI · Mar 176/10

🧠

MR-GNF: Multi-Resolution Graph Neural Forecasting on Ellipsoidal Meshes for Efficient Regional Weather Prediction

Researchers developed MR-GNF, a lightweight AI model that performs regional weather forecasting using multi-resolution graph neural networks on ellipsoidal meshes. The model achieves competitive accuracy with traditional numerical weather prediction systems while using significantly less computational resources (under 80 GPU-hours on a single RTX 6000 Ada).

$ADA

AIBullisharXiv – CS AI · Mar 176/10

🧠

NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL

Researchers have developed NCCL EP, a new communication library for Mixture-of-Experts (MoE) AI model architectures that improves GPU-initiated communication performance. The library provides unified APIs supporting both low-latency inference and high-throughput training modes, built entirely on NVIDIA's NCCL Device API.

🏢 Nvidia

AIBullisharXiv – CS AI · Mar 176/10

🧠

Resolving Interference (RI): Disentangling Models for Improved Model Merging

Researchers have developed Resolving Interference (RI), a new framework that improves AI model merging by reducing cross-task interference when combining specialized models. The method makes models functionally orthogonal to other tasks using only unlabeled data, improving merging performance by up to 3.8% and generalization by up to 2.3%.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Locatability-Guided Adaptive Reasoning for Image Geo-Localization with Vision-Language Models

Researchers introduce Geo-ADAPT, a new AI framework using Vision-Language Models for image geo-localization that adapts reasoning depth based on image complexity. The system uses an Optimized Locatability Score and specialized dataset to achieve state-of-the-art performance while reducing AI hallucinations.

AIBullisharXiv – CS AI · Mar 176/10

🧠

NormCode Canvas: Making LLM Agentic Workflows Development Sustainable via Case-Based Reasoning

NormCode Canvas v1.1.3 introduces a case-based reasoning system for LLM agentic workflows using a semi-formal planning language called NormCode. The deployed system demonstrates multi-step AI task automation across presentation generation, code assistance, and plan compilation with self-sustaining capabilities.

AIBullisharXiv – CS AI · Mar 176/10

🧠

IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring

Researchers introduce IGU-LoRA, a new parameter-efficient fine-tuning method for large language models that adaptively allocates ranks across layers using integrated gradients and uncertainty-aware scoring. The approach addresses limitations of existing methods like AdaLoRA by providing more stable and accurate layer importance estimates, consistently outperforming baselines across diverse tasks.

AIBullisharXiv – CS AI · Mar 176/10

🧠

GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models

Researchers introduce GPrune-LLM, a new structured pruning framework that improves compression of large language models by addressing calibration bias and cross-task generalization issues. The method partitions neurons into behavior-consistent modules and uses adaptive metrics based on distribution sensitivity, showing consistent improvements in post-compression performance.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Bridging Protocol and Production: Design Patterns for Deploying AI Agents with Model Context Protocol

Researchers identify three critical gaps in the Model Context Protocol (MCP) that prevent AI agents from operating safely at production scale, despite MCP having over 10,000 active servers and 97 million monthly SDK downloads. The paper proposes three new mechanisms to address missing identity propagation, adaptive tool budgeting, and structured error semantics based on enterprise deployment experience.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference

Researchers propose Outcome-Aware Tool Selection (OATS), a method to improve tool selection in LLM inference gateways by interpolating tool embeddings toward successful query centroids without adding latency. The approach improves tool selection accuracy on benchmarks while maintaining single-digit millisecond CPU processing times.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Ethical Fairness without Demographics in Human-Centered AI

Researchers introduce Flare, a new AI fairness framework that ensures ethical outcomes without requiring demographic data, addressing privacy and regulatory concerns in human-centered AI applications. The system uses Fisher Information to detect hidden biases and includes a novel evaluation metric suite called BHE for measuring ethical fairness beyond traditional statistical measures.

🏢 Meta

AINeutralarXiv – CS AI · Mar 176/10

🧠

Feature-level Interaction Explanations in Multimodal Transformers

Researchers introduce FL-I2MoE, a new Mixture-of-Experts layer for multimodal Transformers that explicitly identifies synergistic and redundant cross-modal feature interactions. The method provides more interpretable explanations for how different data modalities contribute to AI decision-making compared to existing approaches.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Computation and Communication Efficient Federated Unlearning via On-server Gradient Conflict Mitigation and Expression

Researchers propose FOUL (Federated On-server Unlearning), a new framework for efficiently removing specific participants' data from federated learning models without accessing client data. The approach reduces computational and communication costs while maintaining privacy compliance through a two-stage process that performs unlearning operations on the server side.

AIBullisharXiv – CS AI · Mar 176/10

🧠

LUMINA: Laplacian-Unifying Mechanism for Interpretable Neurodevelopmental Analysis via Quad-Stream GCN

Researchers developed LUMINA, a new Graph Convolutional Network architecture that improves AI-driven diagnosis of neurodevelopmental disorders using fMRI brain data. The system achieved 84.66% accuracy for ADHD and 88.41% for autism spectrum disorder detection by addressing traditional GCN limitations in capturing neural connection dynamics.

AIBullisharXiv – CS AI · Mar 176/10

🧠

PolyGLU: State-Conditional Activation Routing in Transformer Feed-Forward Networks

Researchers introduce PolyGLU, a new transformer architecture that enables dynamic routing among multiple activation functions, mimicking biological neural diversity. The 597M-parameter PolychromaticLM model shows emergent specialization patterns and achieves strong performance despite training on significantly fewer tokens than comparable models.

🏢 Nvidia

AIBullisharXiv – CS AI · Mar 176/10

🧠

Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

Researchers propose Latent Entropy-Aware Decoding (LEAD), a new method to reduce hallucinations in multimodal large reasoning models by switching between continuous and discrete token embeddings based on entropy states. The technique addresses issues where transition words correlate with high-entropy states that lead to unreliable outputs in visual question answering tasks.

← PrevPage 198 of 515Next →