🧠

AI

12,746 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12746 articles

AIBullisharXiv – CS AI · Mar 276/10

🧠

Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

Researchers developed a novel Co-Regulation Design Agentic Loop (CRDAL) system that uses metacognitive agents to improve AI-driven engineering design by reducing design fixation. The system showed better performance than traditional approaches in battery pack design tasks without significantly increasing computational costs.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting

Researchers introduced Graph-of-Mark (GoM), a new visual prompting technique that overlays scene graphs onto images to improve spatial reasoning in multimodal language models. Testing across 3 open-source MLMs and 4 datasets showed GoM improved zero-shot visual question answering and localization accuracy by up to 11 percentage points compared to existing methods like Set-of-Mark.

AIBullisharXiv – CS AI · Mar 276/10

🧠

TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Researchers propose TAG-MoE, a new framework that improves unified image generation and editing models by making AI routing decisions task-aware rather than task-agnostic. The system uses hierarchical task semantic annotation and predictive alignment regularization to reduce task interference and improve model performance.

AIBullisharXiv – CS AI · Mar 276/10

🧠

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Researchers introduce TimeLens, a family of multimodal large language models optimized for video temporal grounding that outperforms existing open-source models and even surpasses proprietary models like GPT-5 and Gemini-2.5-Flash. The work addresses critical data quality issues in existing benchmarks and introduces improved training datasets and algorithmic design principles.

🧠 GPT-5🧠 Gemini

AIBullisharXiv – CS AI · Mar 276/10

🧠

Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

Researchers have developed the first formal mathematical framework for verifying AI agent protocols, specifically comparing Schema-Guided Dialogue (SGD) and Model Context Protocol (MCP). They proved these systems are structurally similar but identified critical gaps in MCP's capabilities, proposing MCP+ extensions to achieve full equivalence with SGD.

AINeutralarXiv – CS AI · Mar 276/10

🧠

The Information Dynamics of Generative Diffusion

Researchers present a unified theoretical framework for understanding generative diffusion models by connecting information theory, dynamics, and thermodynamics. The study reveals that diffusion generation operates as controlled noise-induced symmetry breaking, where the score function regulates information flow from noise to structured data.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Mapping the Course for Prompt-based Structured Prediction

Researchers propose combining large language models (LLMs) with combinatorial inference to address hallucinations and improve structured prediction accuracy. The study finds that incorporating symbolic inference yields more consistent predictions than prompting alone, with calibration and fine-tuning further enhancing performance on complex tasks.

AIBullisharXiv – CS AI · Mar 276/10

🧠

See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis

Researchers introduce ArtiAgent, an automated system that creates pairs of real and artifact-injected images to help AI models better detect and fix visual artifacts in generated content. The system uses three specialized agents to synthesize 100K annotated images, addressing the costly and scaling challenges of human-labeled artifact datasets.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Instruction Following by Principled Boosting Attention of Large Language Models

Researchers developed InstABoost, a new method to improve instruction following in large language models by boosting attention to instruction tokens without retraining. The technique addresses reliability issues where LLMs violate constraints under long contexts or conflicting user inputs, achieving better performance than existing methods across 15 tasks.

AINeutralarXiv – CS AI · Mar 276/10

🧠

Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

Researchers introduce a new framework to evaluate how well Large Language Models understand their own knowledge limitations, finding that traditional confidence metrics miss key differences between models. The study reveals that models showing similar accuracy can have vastly different metacognitive abilities - their capacity to know what they don't know.

🧠 Llama

AIBullisharXiv – CS AI · Mar 276/10

🧠

SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment

Researchers developed SAVe, a self-supervised AI framework that detects audio-visual deepfakes by learning from authentic videos rather than synthetic ones. The system identifies visual artifacts and audio-visual misalignment patterns to detect manipulated content, showing strong cross-dataset generalization capabilities.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization

Researchers developed a framework using large language models (LLMs) as adaptive controllers for SIMP topology optimization, replacing fixed-schedule continuation with real-time parameter adjustments. The LLM agent achieved 5.7% to 18.1% better performance than baseline methods across multiple 2D and 3D engineering problems.

AINeutralarXiv – CS AI · Mar 276/10

🧠

NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders

Researchers benchmarked 20 multimodal AI models on neuroimaging tasks using MRI and CT scans, finding that while technical attributes like imaging modality are nearly solved, diagnostic reasoning remains challenging. Gemini-2.5-Pro and GPT-5-Chat showed strongest diagnostic performance, while open-source MedGemma-1.5-4B demonstrated promising results under few-shot prompting.

🏢 Meta🧠 GPT-5🧠 Gemini

AIBullisharXiv – CS AI · Mar 276/10

🧠

Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system

Researchers developed a framework integrating large language models with knowledge graphs to provide programming feedback and exercise recommendations. The hybrid GenAI-adaptive approach outperformed traditional adaptive learning and GenAI-only modes, producing more correct code submissions and fewer incomplete attempts across 4,956 code submissions.

AIBullisharXiv – CS AI · Mar 276/10

🧠

AIP: Agent Identity Protocol for Verifiable Delegation Across MCP and A2A

Researchers introduce Agent Identity Protocol (AIP) with Invocation-Bound Capability Tokens (IBCTs) to address the lack of authentication in AI agent communications via Model Context Protocol and Agent-to-Agent protocols. The protocol achieved 100% attack rejection rate in testing with minimal performance overhead of 0.086% in real deployments.

🧠 Gemini

AIBullisharXiv – CS AI · Mar 276/10

🧠

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

Researchers developed a multi-answer reinforcement learning approach that trains language models to generate multiple plausible answers with confidence estimates in a single forward pass, rather than collapsing to one dominant answer. The method shows improved diversity and accuracy across question-answering, medical diagnosis, and coding benchmarks while being more computationally efficient than existing approaches.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Experiential Reflective Learning for Self-Improving LLM Agents

Researchers introduce Experiential Reflective Learning (ERL), a framework that enables AI agents to improve performance by learning from past experiences and generating transferable heuristics. The method shows a 7.8% improvement in success rates on the Gaia2 benchmark compared to baseline approaches.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Reconstructing Spiking Neural Networks Using a Single Neuron with Autapses

Researchers propose TDA-SNN, a novel spiking neural network framework that uses a single neuron with time-delayed autapses to reconstruct traditional multilayer architectures. The approach significantly reduces neuron count and memory requirements while maintaining competitive performance, though at the cost of increased temporal latency.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

Researchers introduce QuatRoPE, a novel positional embedding method that improves 3D spatial reasoning in Large Language Models by encoding object relations more efficiently. The method maintains linear scalability with the number of objects and preserves LLMs' original capabilities through the Isolated Gated RoPE Extension.

AIBullisharXiv – CS AI · Mar 276/10

🧠

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis

Researchers introduce TRAJEVAL, a diagnostic framework that breaks down AI code agent performance into three stages (search, read, edit) to identify specific failure points rather than just binary pass/fail outcomes. The framework analyzed 16,758 trajectories and found that real-time feedback based on trajectory signals improved state-of-the-art models by 2.2-4.6 percentage points while reducing costs by 20-31%.

🧠 GPT-5

AIBullisharXiv – CS AI · Mar 276/10

🧠

Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset

Researchers successfully fine-tuned LLaMA 3.1-8B for medical transcription in Finnish, a low-resource language, achieving strong semantic similarity despite low n-gram overlap. The study used simulated clinical conversations from students and demonstrates the feasibility of privacy-oriented domain-specific language models for clinical documentation in underrepresented languages.

AIBullisharXiv – CS AI · Mar 276/10

🧠

X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

Researchers propose X-OPD, a Cross-Modal On-Policy Distillation framework to improve Speech Large Language Models by aligning them with text-based counterparts. The method uses token-level feedback from teacher models to bridge performance gaps in end-to-end speech systems while preserving inherent capabilities.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Self-Corrected Image Generation with Explainable Latent Rewards

Researchers introduce xLARD, a self-correcting framework for text-to-image generation that uses multimodal large language models to provide explainable feedback and improve alignment with complex prompts. The system employs a lightweight corrector that refines latent representations based on structured feedback, addressing challenges in generating images that match fine-grained semantics and spatial relations.

AIBullisharXiv – CS AI · Mar 276/10

🧠

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

Researchers introduce RC2, a reinforcement learning framework that improves multimodal AI reasoning by enforcing consistency between visual and textual representations. The system uses cycle-consistent training to resolve internal conflicts between modalities, achieving up to 7.6 point improvements in reasoning accuracy without requiring additional labeled data.

AIBearisharXiv – CS AI · Mar 276/10

🧠

Back to Basics: Revisiting ASR in the Age of Voice Agents

Researchers introduced WildASR, a multilingual diagnostic benchmark revealing that current ASR systems suffer severe performance degradation in real-world conditions despite achieving near-human accuracy on curated tests. The study found that ASR models often hallucinate plausible but unspoken content under degraded inputs, creating safety risks for voice agents.

← PrevPage 177 of 510Next →