#research News & Analysis

The #research tag covers 919 indexed articles, with 15 published in the last 30 days. Recent coverage remains predominantly neutral at 73.3%, though bullish sentiment has declined 33.7 percentage points compared to the previous quarter, suggesting a cooling in tone. ArXiv's computer science and AI section dominates the source list, alongside research updates from Microsoft and OpenAI. Gemini, Llama, and GPT-4 are the most frequently discussed models in tagged articles, which often intersect with #machine-learning, #llm, and #artificial-intelligence topics. Cryptocurrency tokens including NEAR, LINK, and ETH appear regularly alongside this tag. Scan the article list below to explore recent developments.

sentiment · last 30d (15 articles) · -33.7pp bullish vs prior 90d

Top sources:arXiv – CS AI · 770Microsoft Research Blog · 3OpenAI News · 3MIT News – AI · 3The Register – AI · 2

Often co-tagged with:#machine-learning #llm #arxiv #artificial-intelligence #computer-vision #ai

Most-discussed entities:Gemini · 12Llama · 11GPT-4 · 8Claude · 8GPT-5 · 7

1035 articles

AIBullisharXiv – CS AI · Mar 176/10

🧠

Argumentation for Explainable and Globally Contestable Decision Support with LLMs

Researchers introduce ArgEval, a new framework that enhances Large Language Model decision-making through structured argumentation and global contestability. Unlike previous approaches limited to binary choices and local corrections, ArgEval maps entire decision spaces and builds reusable argumentation frameworks that can be globally modified to prevent repeated mistakes.

AIBullisharXiv – CS AI · Mar 176/10

🧠

OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence

Researchers introduce OpenHospital, a new interactive arena designed to develop and benchmark Large Language Model-based Collective Intelligence through physician-patient agent interactions. The platform uses a data-in-agent-self paradigm to rapidly enhance AI agent capabilities while providing evaluation metrics for medical proficiency and system efficiency.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Advancing Multimodal Agent Reasoning with Long-Term Neuro-Symbolic Memory

Researchers introduced NS-Mem, a neuro-symbolic memory framework that combines neural representations with symbolic structures to improve multimodal AI agent reasoning. The system achieved 4.35% average improvement in reasoning accuracy over pure neural systems, with up to 12.5% gains on constrained reasoning tasks.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science

Researchers propose a new AI learning architecture inspired by human and animal cognition that integrates observational learning and active behavior learning. The framework includes a meta-control system that switches between learning modes, addressing current limitations in autonomous AI learning.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

Researchers developed an information-theoretic framework to explain 'Aha moments' in large language models during reasoning tasks. The study reveals that strong reasoning performance stems from uncertainty externalization rather than specific tokens, decomposing LLM reasoning into procedural information and epistemic verbalization.

AIBearisharXiv – CS AI · Mar 176/10

🧠

Do Metrics for Counterfactual Explanations Align with User Perception?

A new study reveals that standard algorithmic metrics used to evaluate AI counterfactual explanations poorly correlate with human perceptions of explanation quality. The research found weak and dataset-dependent relationships between technical metrics and user judgments, highlighting fundamental limitations in current AI explainability evaluation methods.

AIBearisharXiv – CS AI · Mar 176/10

🧠

Artificial Intelligence: Beyound Ocularcentrism, the New Age of Humans Beyond the Spectacle

A research paper examines how AI-generated visual content is transforming society's relationship with reality and representation, intensifying visual media's dominance in shaping public consciousness. An experiment in Bolzano, Italy revealed people's strong preference for visually striking AI-generated urban development scenarios over practical solutions, highlighting how AI accelerates image commodification and deepens societal alienation.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Learning Retrieval Models with Sparse Autoencoders

Researchers introduce SPLARE, a new method that uses sparse autoencoders (SAEs) to improve learned sparse retrieval in language models. The technique outperforms existing vocabulary-based approaches in multilingual and out-of-domain settings, with SPLARE-7B achieving top results on multilingual retrieval benchmarks.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Feature-level Interaction Explanations in Multimodal Transformers

Researchers introduce FL-I2MoE, a new Mixture-of-Experts layer for multimodal Transformers that explicitly identifies synergistic and redundant cross-modal feature interactions. The method provides more interpretable explanations for how different data modalities contribute to AI decision-making compared to existing approaches.

AIBullisharXiv – CS AI · Mar 176/10

🧠

PolyGLU: State-Conditional Activation Routing in Transformer Feed-Forward Networks

Researchers introduce PolyGLU, a new transformer architecture that enables dynamic routing among multiple activation functions, mimicking biological neural diversity. The 597M-parameter PolychromaticLM model shows emergent specialization patterns and achieves strong performance despite training on significantly fewer tokens than comparable models.

🏢 Nvidia

AIBullisharXiv – CS AI · Mar 176/10

🧠

Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

Researchers propose Latent Entropy-Aware Decoding (LEAD), a new method to reduce hallucinations in multimodal large reasoning models by switching between continuous and discrete token embeddings based on entropy states. The technique addresses issues where transition words correlate with high-entropy states that lead to unreliable outputs in visual question answering tasks.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Na\"ive PAINE: Lightweight Text-to-Image Generation Improvement with Prompt Evaluation

Researchers propose Naïve PAINE, a lightweight system that improves text-to-image generation quality by predicting which initial noise inputs will produce better results before running the full diffusion model. The approach reduces the need for multiple generation cycles to get satisfactory images by pre-selecting higher-quality noise patterns.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Literary Narrative as Moral Probe : A Cross-System Framework for Evaluating AI Ethical Reasoning and Refusal Behavior

Researchers developed a new method to evaluate AI ethical reasoning using literary narratives from science fiction, testing 13 AI systems across 24 conditions. The study found that current AI systems perform surface-level ethical responses rather than genuine moral reasoning, with more sophisticated systems showing more complex failure modes.

🏢 Anthropic🏢 Microsoft🧠 Claude

AIBullisharXiv – CS AI · Mar 166/10

🧠

Seeing Eye to Eye: Enabling Cognitive Alignment Through Shared First-Person Perspective in Human-AI Collaboration

Researchers propose Eye2Eye, a new framework that uses first-person perspective to improve human-AI collaboration by addressing communication and understanding gaps. The AR prototype integrates joint attention coordination, revisable memory, and reflective feedback, showing significant improvements in task completion time and user trust in studies.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

Researchers introduce Cheers, a unified multimodal AI model that combines visual comprehension and generation by decoupling patch details from semantic representations. The model achieves 4x token compression and outperforms existing models like Tar-1.5B while using only 20% of the training cost.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Do LLMs Share Human-Like Biases? Causal Reasoning Under Prior Knowledge, Irrelevant Context, and Varying Compute Budgets

A research study comparing causal reasoning abilities of 20+ large language models against human baselines found that LLMs exhibit more rule-like reasoning strategies than humans, who account for unmentioned factors. While LLMs don't mirror typical human cognitive biases in causal judgment, their rigid reasoning may fail when uncertainty is intrinsic, suggesting they can complement human decision-making in specific contexts.

AIBullisharXiv – CS AI · Mar 166/10

🧠

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

Researchers have developed SAFE, a new framework for ensembling Large Language Models that selectively combines models at specific token positions rather than every token. The method improves both accuracy and efficiency in long-form text generation by considering tokenization mismatches and consensus in probability distributions.

AIBullisharXiv – CS AI · Mar 166/10

🧠

A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks

Researchers published a tutorial on cognitive biases in AI-driven 6G autonomous networks, focusing on how LLM-powered agents can inherit human biases that distort network management decisions. The paper introduces mitigation strategies that demonstrated 5x lower latency and 40% higher energy savings in practical use cases.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives

Researchers developed UNIFIER, a continual learning framework for multimodal large language models (MLLMs) to adapt to changing visual scenarios without catastrophic forgetting. The framework addresses visual discrepancies across different environments like high-altitude, underwater, low-altitude, and indoor scenarios, showing significant improvements over existing methods.

🏢 Hugging Face

AIBullisharXiv – CS AI · Mar 126/10

🧠

Trajectory-Informed Memory Generation for Self-Improving Agent Systems

Researchers introduce a new framework for AI agent systems that automatically extracts learnings from execution trajectories to improve future performance. The system uses four components including trajectory analysis and contextual memory retrieval, achieving up to 14.3 percentage point improvements in task completion on benchmarks.

AIBullisharXiv – CS AI · Mar 126/10

🧠

FAME: Formal Abstract Minimal Explanation for Neural Networks

Researchers introduce FAME (Formal Abstract Minimal Explanations), a new method for explaining neural network decisions that scales to large networks while producing smaller explanations. The approach uses abstract interpretation and dedicated perturbation domains to eliminate irrelevant features and converge to minimal explanations more efficiently than existing methods.

AINeutralarXiv – CS AI · Mar 126/10

🧠

Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization

Researchers propose Nurture-First Development (NFD), a new paradigm for building domain-expert AI agents through progressive growth via conversational interaction rather than traditional code-first or prompt-first approaches. The method uses a Knowledge Crystallization Cycle to convert operational dialogue into structured knowledge assets, demonstrated through a financial research agent case study.

AIBullisharXiv – CS AI · Mar 126/10

🧠

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

Researchers conducted the first comprehensive evaluation of parameter-efficient fine-tuning (PEFT) for multi-task code analysis, showing that a single PEFT module can match full fine-tuning performance while reducing computational costs by up to 85%. The study found that even 1B-parameter models with multi-task PEFT outperform large general-purpose LLMs like DeepSeek and CodeLlama on code analysis tasks.

AINeutralarXiv – CS AI · Mar 126/10

🧠

SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks

Researchers introduce SpreadsheetArena, a platform for evaluating large language models' ability to generate spreadsheet workbooks from natural language prompts. The study reveals that preferred spreadsheet features vary significantly across use cases, and even top-performing models struggle with domain-specific best practices in areas like finance.

AINeutralarXiv – CS AI · Mar 126/10

🧠

FERRET: Framework for Expansion Reliant Red Teaming

Researchers introduce FERRET, a new automated red teaming framework designed to generate multi-modal adversarial conversations to test AI model vulnerabilities. The framework uses three types of expansions (horizontal, vertical, and meta) to create more effective attack strategies and demonstrates superior performance compared to existing red teaming approaches.

← PrevPage 20 of 42Next →