🧠

AI

12,905 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12905 articles

AIBullisharXiv – CS AI · Mar 116/10

🧠

Grounding Synthetic Data Generation With Vision and Language Models

Researchers introduce ARAS400k, a large-scale remote sensing dataset containing 400k images (100k real, 300k synthetic) with segmentation maps and descriptions. The study demonstrates that combining real and synthetic data consistently outperforms training on real data alone for semantic segmentation and image captioning tasks.

AIBearisharXiv – CS AI · Mar 116/10

🧠

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health

A new research study reveals that Large Language Models (LLMs) propagate gender stereotypes and biases when processing healthcare data, particularly through interactions between gender and social determinants of health. The research used French patient records to demonstrate how LLMs rely on embedded stereotypes to make gendered decisions in healthcare contexts.

AIBearisharXiv – CS AI · Mar 116/10

🧠

Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs

Researchers have identified a critical flaw in Large Language Models (LLMs) where they prioritize moral reasoning over commonsense understanding, struggling to detect logical contradictions within moral dilemmas. The study introduces the CoMoral benchmark and reveals a 'narrative focus bias' where LLMs better identify contradictions attributed to secondary characters rather than primary narrators.

AIBullisharXiv – CS AI · Mar 116/10

🧠

TaSR-RAG: Taxonomy-guided Structured Reasoning for Retrieval-Augmented Generation

Researchers propose TaSR-RAG, a new framework that improves Retrieval-Augmented Generation systems by using taxonomy-guided structured reasoning for better evidence selection. The system decomposes complex questions into triple sub-queries and performs step-wise evidence matching, achieving up to 14% performance improvements over existing RAG baselines on multi-hop question answering benchmarks.

AIBullisharXiv – CS AI · Mar 116/10

🧠

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

DuplexCascade introduces a VAD-free cascaded streaming pipeline that enables full-duplex speech-to-speech dialogue while maintaining LLM intelligence. The system converts traditional long utterance turns into micro-turn interactions using special control tokens to coordinate turn-taking and response timing.

AIBullisharXiv – CS AI · Mar 116/10

🧠

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Researchers introduce Latent-DARM, a framework that bridges discrete diffusion language models and autoregressive models to improve multi-agent AI reasoning capabilities. The system achieved significant improvements on reasoning benchmarks, increasing accuracy from 27% to 36% on DART-5 while using less than 2.2% of the token budget of state-of-the-art models.

AINeutralarXiv – CS AI · Mar 116/10

🧠

Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing

Researchers introduce a new framework showing that emotional tone in text systematically affects how large language models process and reason over information. They developed AURA-QA, an emotionally balanced dataset, and proposed emotional regularization techniques that improve reading comprehension performance across multiple benchmarks.

AINeutralarXiv – CS AI · Mar 116/10

🧠

Arbiter: Detecting Interference in LLM Agent System Prompts

Researchers developed Arbiter, a framework to detect interference patterns in system prompts for LLM-based coding agents. Testing on major platforms (Claude, Codex, Gemini) revealed 152 findings and 21 interference patterns, with one discovery leading to a Google patch for Gemini CLI's memory system.

🏢 OpenAI🏢 Anthropic🧠 Claude

AINeutralarXiv – CS AI · Mar 116/10

🧠

Gender Fairness in Audio Deepfake Detection: Performance and Disparity Analysis

Researchers analyzed gender bias in audio deepfake detection systems using fairness metrics beyond standard performance measures. The study found significant gender disparities in error distribution that conventional metrics like Equal Error Rate failed to detect, highlighting the need for fairness-aware evaluation in AI voice authentication systems.

AIBullisharXiv – CS AI · Mar 116/10

🧠

Semantic Level of Detail: Multi-Scale Knowledge Representation via Heat Kernel Diffusion on Hyperbolic Manifolds

Researchers introduce Semantic Level of Detail (SLoD), a framework for AI memory systems that uses heat kernel diffusion on hyperbolic manifolds to enable continuous resolution control in knowledge graphs. The method automatically detects meaningful abstraction levels without manual parameters, achieving perfect recovery on synthetic hierarchies and strong alignment with real-world taxonomies like WordNet.

AIBullisharXiv – CS AI · Mar 116/10

🧠

Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications

Researchers introduce Test-Driven AI Agent Definition (TDAD), a methodology that compiles AI agent prompts from behavioral specifications using automated testing. The approach addresses production deployment challenges by ensuring measurable behavioral compliance and preventing silent regressions in tool-using LLM agents.

AINeutralarXiv – CS AI · Mar 116/10

🧠

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

Researchers propose a unified framework for latent world models in automated driving, organizing recent advances in generative AI and vision-language-action systems. The framework addresses scalable simulation, long-horizon forecasting, and decision-making through latent representations that compress multi-sensor data.

AIBullisharXiv – CS AI · Mar 116/10

🧠

DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

Researchers introduce DexHiL, a human-in-the-loop framework for improving Vision-Language-Action models in robotic dexterous manipulation tasks. The system allows real-time human corrections during robot execution and demonstrates 25% better success rates compared to standard offline training methods.

AIBullisharXiv – CS AI · Mar 116/10

🧠

Turn: A Language for Agentic Computation

Researchers have introduced Turn, a new compiled programming language specifically designed for building autonomous AI agents that use large language models. The language includes built-in features like cognitive type safety, confidence operators, and actor-based process models to address common challenges in agentic software development.

AIBullisharXiv – CS AI · Mar 116/10

🧠

SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation

Researchers introduce SiliconMind-V1, a new multi-agent AI framework that generates Verilog hardware code with improved functional correctness. The system uses locally fine-tuned language models with integrated testing and debugging capabilities, outperforming existing methods while using fewer training resources.

AIBullisharXiv – CS AI · Mar 116/10

🧠

AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents

Researchers introduce AutoAgent, a self-evolving multi-agent framework that combines evolving cognition, contextual decision-making, and elastic memory orchestration to enable adaptive autonomous agents. The system continuously learns from experience without external retraining and shows improved performance across retrieval, tool-use, and collaborative tasks compared to static baselines.

AIBullisharXiv – CS AI · Mar 116/10

🧠

Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT

Researchers propose CVS, a training-free method for selecting high-quality vision-language training data that requires genuine cross-modal reasoning. The method achieves better performance using only 10-15% of data compared to full dataset training, while reducing computational costs by up to 44%.

AINeutralarXiv – CS AI · Mar 116/10

🧠

Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts

Researchers propose a framework using policy-parameterized prompts to influence multi-agent LLM dialogue behavior without training. The approach treats prompts as actions and dynamically constructs them through five components to control conversation flow based on metrics like responsiveness and stance shift.

AINeutralarXiv – CS AI · Mar 116/10

🧠

Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation

A systematic review evaluates federated learning algorithms for edge computing environments, benchmarking five leading methods across accuracy, efficiency, and robustness metrics. The study finds SCAFFOLD achieves highest accuracy (0.90) while FedAvg excels in communication and energy efficiency, though challenges remain with data heterogeneity and energy limitations.

AINeutralarXiv – CS AI · Mar 116/10

🧠

Enhancing Debunking Effectiveness through LLM-based Personality Adaptation

Researchers developed a method using Large Language Models to create personalized fake news debunking messages tailored to individuals' Big Five personality traits. The study found that personalized debunking messages are more persuasive than generic ones, with traits like Openness increasing persuadability while Neuroticism decreases it.

AINeutralarXiv – CS AI · Mar 116/10

🧠

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

A new academic paper introduces context engineering as a discipline for managing AI agent decision-making environments, proposing a maturity model that includes prompt, context, intent, and specification engineering. The research addresses enterprise challenges in scaling multi-agent AI systems, with 75% of enterprises planning deployment within two years despite current scaling difficulties.

🏢 Google🏢 Anthropic

AIBullisharXiv – CS AI · Mar 116/10

🧠

Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review

This comprehensive review examines FPGA-based AI accelerators as a promising solution for deep learning workloads, addressing the limitations of ASIC and GPU accelerators. The paper analyzes hardware optimizations including loop pipelining, parallelism, and quantization techniques that make FPGAs attractive for AI applications requiring high performance and energy efficiency.

AIBullisharXiv – CS AI · Mar 116/10

🧠

Telogenesis: Goal Is All U Need

Researchers propose a new AI system called Telogenesis that generates attention priorities internally without external goals, using three epistemic gaps: ignorance, surprise, and staleness. The system demonstrates adaptive behavior and can discover environmental patterns autonomously, outperforming fixed strategies in experimental validation across 2,500 total runs.

AIBullisharXiv – CS AI · Mar 116/10

🧠

PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution

Researchers introduce PRECEPT, a new framework for AI language model agents that improves knowledge retrieval and adaptation through structured rule learning and conflict-aware memory systems. The framework shows significant performance improvements over existing methods, with 41% better first-try accuracy and enhanced compositional reasoning capabilities.

AIBullisharXiv – CS AI · Mar 116/10

🧠

Social-R1: Towards Human-like Social Reasoning in LLMs

Researchers introduce Social-R1, a reinforcement learning framework that enhances social reasoning in large language models by training on adversarial examples. The approach enables a 4B parameter model to outperform larger models across eight benchmarks by supervising the entire reasoning process rather than just outcomes.

← PrevPage 211 of 517Next →