#research News & Analysis

The #research tag covers 919 indexed articles, with 15 published in the last 30 days. Recent coverage remains predominantly neutral at 73.3%, though bullish sentiment has declined 33.7 percentage points compared to the previous quarter, suggesting a cooling in tone. ArXiv's computer science and AI section dominates the source list, alongside research updates from Microsoft and OpenAI. Gemini, Llama, and GPT-4 are the most frequently discussed models in tagged articles, which often intersect with #machine-learning, #llm, and #artificial-intelligence topics. Cryptocurrency tokens including NEAR, LINK, and ETH appear regularly alongside this tag. Scan the article list below to explore recent developments.

sentiment · last 30d (15 articles) · -33.7pp bullish vs prior 90d

Top sources:arXiv – CS AI · 770Microsoft Research Blog · 3OpenAI News · 3MIT News – AI · 3The Register – AI · 2

Often co-tagged with:#machine-learning #llm #arxiv #artificial-intelligence #computer-vision #ai

Most-discussed entities:Gemini · 12Llama · 11GPT-4 · 8Claude · 8GPT-5 · 7

1035 articles

AIBullisharXiv – CS AI · Mar 176/10

🧠

LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement

Researchers have developed a new audio-visual speech enhancement framework that uses Large Language Models and reinforcement learning to improve speech quality. The method outperforms existing baselines by using LLM-generated natural language feedback as rewards for model training, providing more interpretable optimization compared to traditional scalar metrics.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Not All Latent Spaces Are Flat: Hyperbolic Concept Control

Researchers introduced HyCon, a hyperbolic control mechanism for text-to-image models that provides better safety controls by steering generation away from unsafe content. The technique uses hyperbolic representation spaces instead of traditional Euclidean adjustments, achieving state-of-the-art results across multiple safety benchmarks.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Deeper Thought, Weaker Aim: Understanding and Mitigating Perceptual Impairment during Reasoning in Multimodal Large Language Models

Researchers have identified that multimodal large language models (MLLMs) lose visual focus during complex reasoning tasks, with attention becoming scattered across images rather than staying on relevant regions. They propose a training-free Visual Region-Guided Attention (VRGA) framework that improves visual grounding and reasoning accuracy by reweighting attention to question-relevant areas.

AIBullisharXiv – CS AI · Mar 176/10

🧠

From $\boldsymbol{\log\pi}$ to $\boldsymbol{\pi}$: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight

Researchers introduce Decoupled Gradient Policy Optimization (DGPO), a new reinforcement learning method that improves large language model training by using probability gradients instead of log-probability gradients. The technique addresses instability issues in current methods while maintaining exploration capabilities, showing superior performance across mathematical benchmarks.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Infinite Problem Generator: Verifiably Scaling Physics Reasoning Data with Agentic Workflows

Researchers introduce the Infinite Problem Generator (IPG), an AI framework that creates verifiable physics problems using executable Python code instead of probabilistic text generation. The system released ClassicalMechanicsV1, a dataset of 1,335 physics problems that demonstrates how code complexity can precisely measure problem difficulty for training large language models.

AIBullisharXiv – CS AI · Mar 176/10

🧠

$PA^3$: $\textbf{P}$olicy-$\textbf{A}$ware $\textbf{A}$gent $\textbf{A}$lignment through Chain-of-Thought

Researchers developed PA³, a new method to improve AI assistant alignment with business policies by teaching models to recall and apply relevant rules during reasoning without including full policies in prompts. The approach reduces computational overhead by 40% while achieving 16-point performance improvements over baselines.

$PA

AIBullisharXiv – CS AI · Mar 176/10

🧠

RAZOR: Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

Researchers introduce RAZOR, a new framework for efficiently removing sensitive information from AI models like CLIP and Stable Diffusion without requiring full retraining. The method selectively edits specific layers and attention heads in transformer models to achieve targeted 'unlearning' while preserving overall performance.

🧠 Stable Diffusion

AIBullisharXiv – CS AI · Mar 176/10

🧠

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Researchers introduce CLAG, a clustering-based memory framework that helps small language model agents organize and retrieve information more effectively. The system addresses memory dilution issues by creating semantic clusters with automated profiles, showing improved performance across multiple QA datasets.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Agentic workflow enables the recovery of critical materials from complex feedstocks via selective precipitation

Researchers have developed a multi-agentic AI workflow that uses automated instruments and AI agents to recover critical materials from complex feedstocks through selective precipitation. The approach dramatically reduces development timelines from months or years to just days for creating efficient and scalable material separation processes.

AINeutralarXiv – CS AI · Mar 176/10

🧠

InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social Systems

Researchers introduced InterveneBench, a new benchmark comprising 744 peer-reviewed studies to evaluate large language models' ability to reason about policy interventions and causal inference in social science contexts. Current state-of-the-art LLMs struggle with this type of reasoning, prompting the development of STRIDES, a multi-agent framework that significantly improves performance on these tasks.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Conceptual Views of Neural Networks: A Framework for Neuro-Symbolic Analysis

Researchers introduce 'conceptual views' as a formal framework based on Formal Concept Analysis to globally explain neural networks. Testing on 24 ImageNet models and Fruits-360 datasets shows the framework can faithfully represent models, enable architecture comparison, and extract human-comprehensible rules from neurons.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Estimating Causal Effects of Text Interventions Leveraging LLMs

Researchers propose CausalDANN, a novel method using large language models to estimate causal effects of textual interventions in social systems. The approach addresses limitations of traditional causal inference methods when dealing with complex, high-dimensional textual data and can handle arbitrary text interventions even with observational data only.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Researchers conducted the first systematic study on post-training quantization for diffusion large language models (dLLMs), identifying activation outliers as a key challenge for compression. The study evaluated state-of-the-art quantization methods across multiple dimensions to provide insights for efficient dLLM deployment on edge devices.

AIBullisharXiv – CS AI · Mar 176/10

🧠

XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning

Researchers introduce XQC, a deep reinforcement learning algorithm that achieves state-of-the-art sample efficiency by optimizing the critic network's condition number through batch normalization, weight normalization, and distributional cross-entropy loss. The method outperforms existing approaches across 70 continuous control tasks while using fewer parameters.

AINeutralarXiv – CS AI · Mar 176/10

🧠

EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos

EgoGrasp introduces the first method to reconstruct world-space hand-object interactions from egocentric videos using open-vocabulary objects. The multi-stage framework combines vision foundation models with body-guided diffusion models to achieve state-of-the-art performance in 3D scene reconstruction and hand pose estimation.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Agentic Retoucher for Text-To-Image Generation

Researchers introduce Agentic Retoucher, a new AI framework that fixes common distortions in text-to-image generation through a three-agent system for perception, reasoning, and correction. The system outperformed existing methods on a new 27K-image dataset, potentially improving the quality and reliability of AI-generated images.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

Researchers introduce Imagine-then-Plan (ITP), a new AI framework that enables agents to learn through adaptive lookahead imagination using world models. The system allows AI agents to simulate multi-step future scenarios and adjust planning horizons dynamically, significantly outperforming existing methods in benchmark tests.

AIBearisharXiv – CS AI · Mar 176/10

🧠

A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

A new research study reveals that AI judges used to evaluate the safety of large language models perform poorly when assessing adversarial attacks, often degrading to near-random accuracy. The research analyzed 6,642 human-verified labels and found that many attacks artificially inflate their success rates by exploiting judge weaknesses rather than generating genuinely harmful content.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science

Researchers propose a new AI learning architecture inspired by human and animal cognition that integrates observational learning and active behavior learning. The framework includes a meta-control system that switches between learning modes, addressing current limitations in autonomous AI learning.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Multi-hop Reasoning and Retrieval in Embedding Space: Leveraging Large Language Models with Knowledge

Researchers propose EMBRAG, a new framework that combines large language models with knowledge graphs to improve reasoning accuracy and reduce hallucinations. The system generates multiple logical rules from queries and applies them in embedding space, achieving state-of-the-art performance on knowledge graph question-answering benchmarks.

AINeutralarXiv – CS AI · Mar 176/10

🧠

MESD: Detecting and Mitigating Procedural Bias in Intersectional Groups

Researchers propose MESD (Multi-category Explanation Stability Disparity), a new metric to detect procedural bias in AI models across intersectional groups. They also introduce UEF framework that balances utility, explanation quality, and fairness in machine learning systems.

AINeutralarXiv – CS AI · Mar 176/10

🧠

The AI Fiction Paradox

A new research paper identifies the 'AI-Fiction Paradox' - AI models desperately need fiction for training data but struggle to generate quality fiction themselves. The paper outlines three core challenges: narrative causation requiring temporal paradoxes, informational revaluation that conflicts with current attention mechanisms, and multi-scale emotional architecture that current AI cannot orchestrate effectively.

AIBullisharXiv – CS AI · Mar 176/10

🧠

EviAgent: Evidence-Driven Agent for Radiology Report Generation

Researchers introduce EviAgent, a new AI system for automated radiology report generation that provides transparent, evidence-driven analysis. The system addresses key limitations of current medical AI models by offering traceable decision-making and integrating external domain knowledge, outperforming existing specialized medical models in testing.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

A comprehensive research study examines the relationship between Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) methods for improving Large Language Models after pre-training. The research identifies emerging trends toward hybrid post-training approaches that combine both methods, analyzing applications from 2023-2025 to establish when each method is most effective.

AINeutralarXiv – CS AI · Mar 176/10

🧠

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

Researchers introduce AgentProcessBench, the first benchmark for evaluating step-level effectiveness in AI tool-using agents, comprising 1,000 trajectories and 8,509 human-labeled annotations. The benchmark reveals that current AI models struggle with distinguishing neutral and erroneous actions in tool execution, and that process-level signals can significantly enhance test-time performance.

← PrevPage 19 of 42Next →