🧠

AI

21,407 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21407 articles

AIBullisharXiv – CS AI · Mar 96/10

🧠

MAP: Mitigating Hallucinations in Large Vision-Language Models with Map-Level Attention Processing

Researchers developed MAP (Map-Level Attention Processing), a training-free method to reduce hallucinations in Large Vision-Language Models by treating hidden states as 2D semantic maps. The approach uses attention-based operations to better leverage factual information and improve consistency between generated text and visual inputs.

AINeutralarXiv – CS AI · Mar 96/10

🧠

KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes

Researchers introduce KramaBench, a comprehensive benchmark testing AI systems' ability to execute end-to-end data processing pipelines on real-world data lakes. The study reveals significant limitations in current AI systems, with the best performing system achieving only 55% accuracy in full data-lake scenarios and leading LLMs implementing just 20% of individual data tasks correctly.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence

This research survey examines Federated Learning (FL), a distributed machine learning approach that enables collaborative AI model training without centralizing sensitive data. The paper covers FL's technical challenges, privacy mechanisms, and applications across healthcare, finance, and IoT systems.

AIBearisharXiv – CS AI · Mar 96/10

🧠

From Tokenizer Bias to Backbone Capability: A Controlled Study of LLMs for Time Series Forecasting

Researchers conducted a controlled study examining the effectiveness of large language models (LLMs) for time series forecasting, finding that existing approaches often overfit to small datasets. Despite some promise, LLMs did not consistently outperform models specifically trained on large-scale time series data.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Researchers introduce 3DThinker, a new framework that enables vision-language models to perform 3D spatial reasoning from limited 2D views without requiring 3D training data. The system uses a two-stage training approach to align 3D representations with foundation models and demonstrates superior performance across multiple benchmarks.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Maximizing Asynchronicity in Event-based Neural Networks

Researchers have developed EVA (EVent Asynchronous feature learning), a new framework that improves event-based neural networks by adapting language modeling techniques to process asynchronous visual data from event cameras. EVA demonstrates superior performance on recognition and detection tasks, achieving breakthrough results including 0.477 mAP on the Gen1 dataset for demanding detection applications.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

A comprehensive survey examines how large multimodal language models are transforming scientific research across five key areas: literature search, idea generation, content creation, multimodal artifact production, and peer review evaluation. The research highlights both the potential for AI-assisted scientific discovery and the ethical concerns regarding research integrity and misuse of generative models.

AINeutralarXiv – CS AI · Mar 96/10

🧠

ContextBench: Modifying Contexts for Targeted Latent Activation

Researchers have developed ContextBench, a new benchmark for evaluating methods that generate targeted inputs to trigger specific behaviors in language models. The study introduces enhanced Evolutionary Prompt Optimization techniques that better balance effectiveness in activating AI model features while maintaining linguistic fluency.

AINeutralarXiv – CS AI · Mar 96/10

🧠

MERIT Feedback Elicits Better Bargaining in LLM Negotiators

Researchers introduce AgoraBench, a new framework for improving Large Language Models' bargaining and negotiation capabilities through utility-based feedback mechanisms. The study reveals that current LLMs struggle with strategic depth in negotiations and proposes human-aligned metrics and training methods to enhance their performance.

AIBearisharXiv – CS AI · Mar 96/10

🧠

Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs

Researchers developed a new framework to assess moral competence in large language models, finding that current evaluations may overestimate AI moral reasoning capabilities. While LLMs outperformed humans on standard ethical scenarios, they performed significantly worse when required to identify morally relevant information from noisy data.

AINeutralarXiv – CS AI · Mar 96/10

🧠

VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs

Researchers introduced VisioMath, a new benchmark with 1,800 K-12 math problems designed to test Large Multimodal Models' ability to distinguish between visually similar diagrams. The study reveals that current state-of-the-art models struggle with fine-grained visual reasoning, often relying on shallow positional heuristics rather than proper image-text alignment.

AINeutralarXiv – CS AI · Mar 96/10

🧠

Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

This position paper argues against anthropomorphizing intermediate tokens generated by language models as 'reasoning traces' or 'thoughts'. The authors contend that treating these computational outputs as human-like thinking processes is misleading and potentially harmful to AI research and understanding.

AIBullisharXiv – CS AI · Mar 96/10

🧠

RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering

Researchers introduced RAMoEA-QA, a new AI system that uses hierarchical specialization to answer questions about respiratory audio recordings from mobile devices. The system employs a two-stage routing approach with Audio Mixture-of-Experts and Language Mixture-of-Adapters to handle diverse recording conditions and query types, achieving 0.72 test accuracy compared to 0.61-0.67 for existing baselines.

AIBullisharXiv – CS AI · Mar 96/10

🧠

PONTE: Personalized Orchestration for Natural Language Trustworthy Explanations

Researchers introduce PONTE, a human-in-the-loop framework that creates personalized, trustworthy AI explanations by combining user preference modeling with verification modules. The system addresses the challenge of one-size-fits-all AI explanations by adapting to individual user expertise and cognitive needs while maintaining faithfulness and reducing hallucinations.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Prompt Group-Aware Training for Robust Text-Guided Nuclei Segmentation

Researchers developed a new training method to improve the robustness of AI foundation models like SAM3 for medical image segmentation by reducing sensitivity to prompt variations. The approach groups semantically similar prompts together and uses consistency constraints to ensure more reliable predictions across different prompt formulations.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education

Researchers developed an AI system that can detect fetal orofacial clefts in ultrasound images with over 93% sensitivity and 95% specificity, matching senior radiologist performance. The system was trained on 45,139 ultrasound images from 9,215 fetuses across 22 hospitals and can also improve junior radiologist diagnostic accuracy by 6%.

🏢 Microsoft

AIBullisharXiv – CS AI · Mar 96/10

🧠

Dynamic Chunking Diffusion Transformer

Researchers introduce Dynamic Chunking Diffusion Transformer (DC-DiT), a new AI model that adaptively processes images by allocating more computational resources to detail-rich regions and fewer to uniform backgrounds. The system improves image generation quality while reducing computational costs by up to 16x compared to traditional diffusion transformers.

AINeutralarXiv – CS AI · Mar 96/10

🧠

ESAA-Security: An Event-Sourced, Verifiable Architecture for Agent-Assisted Security Audits of AI-Generated Code

Researchers have developed ESAA-Security, a new architecture for conducting secure, verifiable audits of AI-generated code using structured agent workflows rather than unstructured LLM conversations. The system creates an immutable audit trail through event-sourcing and produces comprehensive security reports across 26 tasks and 95 executable checks.

AINeutralarXiv – CS AI · Mar 96/10

🧠

The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation

A systematic literature review of 346 papers reveals critical flaws in AI data annotation practices, arguing that treating human disagreement as 'noise' rather than meaningful signal undermines model quality. The study proposes pluralistic annotation frameworks that embrace diverse human perspectives instead of forcing artificial consensus.

AIBullisharXiv – CS AI · Mar 96/10

🧠

MoEless: Efficient MoE LLM Serving via Serverless Computing

Researchers introduce MoEless, a serverless framework for serving Mixture-of-Experts Large Language Models that addresses expert load imbalance issues. The system reduces inference latency by 43% and costs by 84% compared to existing solutions by using predictive load balancing and optimized expert scaling strategies.

AIBullisharXiv – CS AI · Mar 96/10

🧠

DEX-AR: A Dynamic Explainability Method for Autoregressive Vision-Language Models

Researchers developed DEX-AR, a new explainability method for autoregressive Vision-Language Models that generates 2D heatmaps to understand how these AI systems make decisions. The method addresses challenges in interpreting modern VLMs by analyzing token-by-token generation and visual-textual interactions, showing improved performance across multiple benchmarks.

🏢 Perplexity

AIBullisharXiv – CS AI · Mar 96/10

🧠

Cut to the Chase: Training-free Multimodal Summarization via Chain-of-Events

Researchers introduce CoE, a training-free multimodal summarization framework that uses a Chain-of-Events approach with Hierarchical Event Graph to better understand and summarize content across videos, transcripts, and images. The system achieves significant performance improvements over existing methods, showing average gains of +3.04 ROUGE, +9.51 CIDEr, and +1.88 BERTScore across eight datasets.

AINeutralarXiv – CS AI · Mar 96/10

🧠

Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR

Researchers introduced RAPTOR, a study comparing compact SSL models for audio deepfake detection, finding that multilingual HuBERT pre-training enables smaller 100M parameter models to match larger commercial systems. The study reveals that pre-training approach matters more than model size, with WavLM variants showing overconfident miscalibration issues compared to HuBERT models.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion

Researchers introduce Place-it-R1, an AI framework that uses Multimodal Large Language Models to insert objects into videos while maintaining physical realism. The system employs Chain-of-Thought reasoning to ensure inserted objects interact naturally with their environment, addressing the gap between visual quality and physical plausibility in video editing.

AINeutralarXiv – CS AI · Mar 96/10

🧠

Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

Researchers analyzed Vision-Language Models (VLMs) used in automated driving to understand why they fail on simple visual tasks. They identified two failure modes: perceptual failure where visual information isn't encoded, and cognitive failure where information is present but not properly aligned with language semantics.

← PrevPage 537 of 857Next →