🧠

AI

12,967 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12967 articles

AIBullisharXiv – CS AI · Mar 37/109

🧠

From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents

Researchers have developed MM-Mem, a new pyramidal multimodal memory architecture that enables AI systems to better understand long-horizon videos by mimicking human cognitive memory processes. The system addresses current limitations in multimodal large language models by creating a hierarchical memory structure that progressively distills detailed visual information into high-level semantic understanding.

AIBearisharXiv – CS AI · Mar 36/107

🧠

PanCanBench: A Comprehensive Benchmark for Evaluating Large Language Models in Pancreatic Oncology

Researchers created PanCanBench, a comprehensive benchmark evaluating 22 large language models on pancreatic cancer-related patient questions, revealing significant variations in clinical accuracy and high hallucination rates. The study found that even top-performing models like GPT-4o and Gemini-2.5 Pro had hallucination rates of 6%, while newer reasoning-optimized models didn't consistently improve factual accuracy.

AINeutralarXiv – CS AI · Mar 36/108

🧠

Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models

New theoretical research analyzes how Large Language Models learn during pretraining versus post-training phases, revealing that balanced pretraining data creates latent capabilities activated later, while supervised fine-tuning works best on small, challenging datasets and reinforcement learning requires large-scale data that isn't overly difficult.

AIBullisharXiv – CS AI · Mar 36/106

🧠

GlassMol: Interpretable Molecular Property Prediction with Concept Bottleneck Models

Researchers introduce GlassMol, a new interpretable AI model for molecular property prediction that addresses the black-box problem in drug discovery. The model uses Concept Bottleneck Models with automated concept curation and LLM-guided selection, achieving performance that matches or exceeds traditional black-box models across thirteen benchmarks.

AIBullisharXiv – CS AI · Mar 37/106

🧠

MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

MOSAIC is a new open-source platform that enables cross-paradigm comparison and evaluation of different AI agents including reinforcement learning, large language models, vision-language models, and human decision-makers within the same environment. The platform introduces three key technical contributions: an IPC-based worker protocol, operator abstraction for unified interfaces, and a deterministic evaluation framework for reproducible research.

AIBearisharXiv – CS AI · Mar 36/108

🧠

LLM Self-Explanations Fail Semantic Invariance

Research reveals that Large Language Model (LLM) self-explanations fail semantic invariance testing, showing that AI models' self-reports change based on how tasks are framed rather than actual task performance. Four frontier AI models demonstrated unreliable self-reporting when faced with semantically different but functionally identical tool descriptions, raising questions about using model self-reports as evidence of capability.

AIBearisharXiv – CS AI · Mar 37/109

🧠

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders

A study reveals that safety-aligned large language models exhibit "Defensive Refusal Bias," refusing legitimate cybersecurity defense tasks 2.72x more often when they contain security-sensitive keywords. The research found particularly high refusal rates for critical defensive operations like system hardening (43.8%) and malware analysis (34.3%), suggesting current AI safety measures rely on semantic similarity rather than understanding intent.

AIBullisharXiv – CS AI · Mar 37/106

🧠

Spectral Attention Steering for Prompt Highlighting

Researchers introduce SEKA and AdaSEKA, new training-free methods for attention steering in AI models that work with memory-efficient implementations like FlashAttention. These techniques enable better prompt highlighting by directly editing key embeddings using spectral decomposition, offering significant performance improvements with lower computational overhead.

AINeutralarXiv – CS AI · Mar 37/108

🧠

The MAMA-MIA Challenge: Advancing Generalizability and Fairness in Breast MRI Tumor Segmentation and Treatment Response Prediction

The MAMA-MIA Challenge introduced a large-scale benchmark for AI-powered breast cancer tumor segmentation and treatment response prediction using MRI data from 1,506 US patients for training and 574 European patients for testing. Results from 26 international teams revealed significant performance variability and trade-offs between accuracy and fairness across demographic subgroups when AI models were tested across different institutions and continents.

AIBullisharXiv – CS AI · Mar 37/107

🧠

LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models

Researchers propose Likelihood-Free Policy Optimization (LFPO), a new framework for improving Diffusion Large Language Models by bypassing likelihood computation issues that plague existing methods. LFPO uses geometric velocity rectification to optimize denoising logits directly, achieving better performance on code and reasoning tasks while reducing inference time by 20%.

AIBullisharXiv – CS AI · Mar 36/107

🧠

Mean-Flow based One-Step Vision-Language-Action

Researchers developed a Mean-Flow based One-Step Vision-Language-Action (VLA) approach that dramatically improves robotic manipulation efficiency by eliminating iterative sampling requirements. The new method achieves 8.7x faster generation than SmolVLA and 83.9x faster than Diffusion Policy in real-world robotic experiments.

AIBullisharXiv – CS AI · Mar 36/106

🧠

Linking Knowledge to Care: Knowledge Graph-Augmented Medical Follow-Up Question Generation

Researchers developed KG-Followup, a knowledge graph-augmented large language model system that generates medical follow-up questions for pre-diagnostic assessment. The system combines structured medical domain knowledge with LLMs to improve clinical diagnosis efficiency, outperforming existing methods by 5-8% in recall benchmarks.

AIBullisharXiv – CS AI · Mar 36/106

🧠

Monocular 3D Object Position Estimation with VLMs for Human-Robot Interaction

Researchers developed a Vision-Language Model capable of estimating 3D object positions from monocular RGB images for human-robot interaction. The model achieved a median accuracy of 13mm and can make acceptable predictions for robot interaction in 25% of cases, representing a five-fold improvement over baseline methods.

AIBullisharXiv – CS AI · Mar 36/106

🧠

VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning

Researchers developed VisNec, a framework that identifies which training samples truly require visual reasoning for multimodal AI instruction tuning. The method achieves equivalent performance using only 15% of training data by filtering out visually redundant samples, potentially making multimodal AI training more efficient.

AIBullisharXiv – CS AI · Mar 37/106

🧠

Attention Smoothing Is All You Need For Unlearning

Researchers propose Attention Smoothing Unlearning (ASU), a new framework that helps Large Language Models forget sensitive or copyrighted content without losing overall performance. The method uses self-distillation and attention smoothing to erase specific knowledge while maintaining coherent responses, outperforming existing unlearning techniques.

AIBullisharXiv – CS AI · Mar 36/106

🧠

TARSE: Test-Time Adaptation via Retrieval of Skills and Experience for Reasoning Agents

Researchers developed TARSE, a new AI system for clinical decision-making that retrieves relevant medical skills and experiences from curated libraries to improve reasoning accuracy. The system performs test-time adaptation to align language models with clinically valid logic, showing improvements over existing medical AI baselines in question-answering benchmarks.

AINeutralarXiv – CS AI · Mar 36/106

🧠

Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence

Researchers identified Self-Anchoring Calibration Drift (SACD), where large language models show systematic confidence changes when building on their own outputs in multi-turn conversations. Testing Claude Sonnet 4.6, Gemini 3.1 Pro, and GPT-5.2 revealed model-specific patterns, with Claude showing decreasing confidence and significant calibration errors, while GPT-5.2 exhibited opposite behavior in open-ended domains.

$NEAR

AIBullisharXiv – CS AI · Mar 36/107

🧠

TC-SSA: Token Compression via Semantic Slot Aggregation for Gigapixel Pathology Reasoning

Researchers propose TC-SSA, a token compression framework that enables large vision-language models to process gigapixel pathology images by reducing visual tokens to 1.7% of original size while maintaining diagnostic accuracy. The method achieves 78.34% overall accuracy on SlideBench and demonstrates strong performance across multiple cancer classification tasks.

AIBullisharXiv – CS AI · Mar 36/108

🧠

A Deep Learning Framework for Heat Demand Forecasting using Time-Frequency Representations of Decomposed Features

Researchers developed a deep learning framework using Continuous Wavelet Transform and CNNs for heat demand forecasting in district heating systems. The model achieved 36-43% reduction in forecasting errors compared to existing methods, reaching up to 95% accuracy in predicting day-ahead heat demand across multiple European cities.

AIBullisharXiv – CS AI · Mar 36/108

🧠

GRAD-Former: Gated Robust Attention-based Differential Transformer for Change Detection

Researchers introduce GRAD-Former, a novel AI framework for detecting changes in satellite imagery that outperforms existing methods while using fewer computational resources. The system uses gated attention mechanisms and differential transformers to more efficiently identify semantic differences in very high-resolution satellite images.

AIBullisharXiv – CS AI · Mar 37/1010

🧠

MedCollab: Causal-Driven Multi-Agent Collaboration for Full-Cycle Clinical Diagnosis via IBIS-Structured Argumentation

Researchers have developed MedCollab, a multi-agent AI framework that uses structured argumentation and causal reasoning to improve clinical diagnosis accuracy. The system outperforms traditional LLMs by reducing medical hallucinations and providing more transparent, clinically compliant diagnostic processes through hierarchical consultation workflows.

AIBullisharXiv – CS AI · Mar 36/106

🧠

TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization

Researchers introduce TripleSumm, a novel AI architecture that adaptively fuses visual, text, and audio modalities for improved video summarization. The team also releases MoSu, the first large-scale benchmark dataset providing all three modalities for multimodal video summarization research.

AIBullisharXiv – CS AI · Mar 36/108

🧠

Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

Researchers propose PR-A²CL, a new AI method for solving compositional visual relations tasks by identifying outlier images among sets that follow the same compositional rules. The approach uses augmented anomaly contrastive learning and a predict-and-verify paradigm, showing significant performance improvements over existing visual reasoning models on benchmark datasets.

$CL

AIBullisharXiv – CS AI · Mar 36/1010

🧠

ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models

Researchers propose ClinCoT, a new framework for medical AI that improves Visual Language Models by grounding reasoning in specific visual regions rather than just text. The approach reduces factual hallucinations in medical AI systems by using visual chain-of-thought reasoning with clinically relevant image regions.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Unified Vision-Language Modeling via Concept Space Alignment

Researchers introduce V-SONAR, a vision-language embedding system that extends text-only SONAR to support 1500+ languages with vision capabilities. The system demonstrates state-of-the-art performance on video captioning and multilingual vision tasks through V-LCM, which combines vision and language processing in a unified framework.

← PrevPage 228 of 519Next →