#research News & Analysis

The #research tag covers 919 indexed articles, with 15 published in the last 30 days. Recent coverage remains predominantly neutral at 73.3%, though bullish sentiment has declined 33.7 percentage points compared to the previous quarter, suggesting a cooling in tone. ArXiv's computer science and AI section dominates the source list, alongside research updates from Microsoft and OpenAI. Gemini, Llama, and GPT-4 are the most frequently discussed models in tagged articles, which often intersect with #machine-learning, #llm, and #artificial-intelligence topics. Cryptocurrency tokens including NEAR, LINK, and ETH appear regularly alongside this tag. Scan the article list below to explore recent developments.

sentiment · last 30d (15 articles) · -33.7pp bullish vs prior 90d

Top sources:arXiv – CS AI · 770Microsoft Research Blog · 3OpenAI News · 3MIT News – AI · 3The Register – AI · 2

Often co-tagged with:#machine-learning #llm #arxiv #artificial-intelligence #computer-vision #ai

Most-discussed entities:Gemini · 12Llama · 11GPT-4 · 8Claude · 8GPT-5 · 7

947 articles

AIBullisharXiv – CS AI · Mar 177/10

🧠

OpenClaw-RL: Train Any Agent Simply by Talking

OpenClaw-RL is a new reinforcement learning framework that enables AI agents to learn continuously from any type of interaction, including conversations, terminal commands, and GUI interactions. The system extracts learning signals from user responses and feedback, allowing agents to improve simply by being used in real-world scenarios.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Agentic DAG-Orchestrated Planner Framework for Multi-Modal, Multi-Hop Question Answering in Hybrid Data Lakes

Researchers introduce A.DOT Planner, an AI framework that enables multi-hop question answering across hybrid data lakes containing both structured and unstructured data. The system uses directed acyclic graphs to orchestrate complex queries, achieving 14.8% better accuracy and 10.7% better completeness than existing solutions.

$DOT

AINeutralarXiv – CS AI · Mar 177/10

🧠

Bridging the Gap in the Responsible AI Divides

Researchers analyzed 3,550 papers to map the divide between AI Safety (AIS) and AI Ethics (AIE) communities, proposing a 'critical bridging' approach to reconcile tensions. The study identifies four engagement modes and finds overlapping concerns around transparency, reproducibility, and governance despite fundamental differences in approach.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Reducing Cost of LLM Agents with Trajectory Reduction

Researchers introduce AgentDiet, a trajectory reduction technique that cuts computational costs for LLM-based agents by 39.9%-59.7% in input tokens and 21.1%-35.9% in total costs while maintaining performance. The approach removes redundant and expired information from agent execution trajectories during inference time.

AIBullisharXiv – CS AI · Mar 177/10

🧠

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

Researchers introduce APEX-Searcher, a new framework that enhances large language models' search capabilities through a two-stage approach combining reinforcement learning for strategic planning and supervised fine-tuning for execution. The system addresses limitations in multi-hop question answering by decoupling retrieval processes into planning and execution phases, showing significant improvements across multiple benchmarks.

AIBullisharXiv – CS AI · Mar 177/10

🧠

OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions

Researchers propose OrthoFormer, a new Transformer architecture that addresses causal learning limitations by embedding instrumental variable estimation directly into neural networks. The framework aims to distinguish between spurious correlations and true causal mechanisms, potentially improving AI model robustness and reliability under distribution shifts.

AIBullisharXiv – CS AI · Mar 177/10

🧠

The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)

An NSF workshop community paper outlines strategic priorities for strengthening the intersection between artificial intelligence and mathematical/physical sciences (AI+MPS). The report proposes three key activities: enabling bidirectional AI+MPS research, building interdisciplinary communities, and fostering education and workforce development in this rapidly evolving field.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning

Researchers at NVIDIA developed NEMOTRON-CROSSTHINK, a new AI framework that uses reinforcement learning with multi-domain data to improve language model reasoning across diverse fields beyond just mathematics. The system shows significant performance improvements on both mathematical and non-mathematical reasoning benchmarks while using 28% fewer tokens for correct answers.

AINeutralarXiv – CS AI · Mar 177/10

🧠

AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models

Researchers introduce AVA-Bench, a new benchmark that evaluates vision foundation models (VFMs) by testing 14 distinct atomic visual abilities like localization and depth estimation. This approach provides more precise assessment than traditional VQA benchmarks and reveals that smaller 0.5B language models can evaluate VFMs as effectively as 7B models while using 8x fewer GPU resources.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Boosting Large Language Models with Mask Fine-Tuning

Researchers introduce Mask Fine-Tuning (MFT), a novel approach that improves large language model performance by applying binary masks to optimized models without updating weights. The method achieves consistent performance gains across different domains and model architectures, with average improvements of 2.70/4.15 in IFEval benchmarks for LLaMA models.

AIBullisharXiv – CS AI · Mar 177/10

🧠

From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation

Researchers introduce PRIMO R1, a 7B parameter AI framework that transforms video MLLMs from passive observers into active critics for robotic manipulation tasks. The system uses reinforcement learning to achieve 50% better accuracy than specialized baselines and outperforms 72B-scale models, establishing state-of-the-art performance on the RoboFail benchmark.

🏢 OpenAI🧠 o1

AINeutralarXiv – CS AI · Mar 177/10

🧠

Right for the Wrong Reasons: Epistemic Regret Minimization for Causal Rung Collapse in LLMs

Researchers identify a fundamental flaw in large language models called 'Rung Collapse' where AI systems achieve correct answers through flawed causal reasoning that fails under distribution shifts. They propose Epistemic Regret Minimization (ERM) as a solution that penalizes incorrect reasoning processes independently of task success, showing 53-59% recovery of reasoning errors in experiments across six frontier LLMs.

🧠 GPT-5

AIBullisharXiv – CS AI · Mar 177/10

🧠

EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment

Researchers introduce EcoAlign, a new framework for aligning Large Vision-Language Models that treats alignment as an economic optimization problem. The method balances safety, utility, and computational costs while preventing harmful reasoning disguised with benign justifications, showing superior performance across multiple models and datasets.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI

Researchers introduced SOAR, a self-improving language model system that combines evolutionary search with hindsight learning for program synthesis tasks. The method achieved 52% success rate on the challenging ARC-AGI benchmark by iteratively improving through search and refinement cycles.

AIBearisharXiv – CS AI · Mar 177/10

🧠

Large Language Models Reproduce Racial Stereotypes When Used for Text Annotation

A comprehensive study of 19 large language models reveals systematic racial bias in automated text annotation, with over 4 million judgments showing LLMs consistently reproduce harmful stereotypes based on names and dialect. The research demonstrates that AI models rate texts with Black-associated names as more aggressive and those written in African American Vernacular English as less professional and more toxic.

AIBullisharXiv – CS AI · Mar 177/10

🧠

3D-LFM: Lifting Foundation Model

Researchers have developed the first 3D Lifting Foundation Model (3D-LFM) that can reconstruct 3D structures from 2D landmarks without requiring correspondence across training data. The model uses transformer architecture to achieve state-of-the-art performance across various object categories with resilience to occlusions and noise.

AIBearisharXiv – CS AI · Mar 177/10

🧠

Questionnaire Responses Do not Capture the Safety of AI Agents

Researchers argue that current AI safety assessments using questionnaire-style prompts on language models are inadequate for evaluating real AI agents. The study suggests these methods lack construct validity because LLM responses to hypothetical scenarios don't accurately represent how AI agents would actually behave in real-world deployments.

AI × CryptoNeutralDecrypt – AI · Mar 167/10

🤖

IBM Opens Quantum Hardware to Researchers as Bitcoin Security Threat Looms

IBM is expanding access to its quantum computing processors for researchers and developers. This development comes as the cryptocurrency community prepares for potential future threats quantum computing may pose to Bitcoin's current cryptographic security systems.

$BTC

AIBullisharXiv – CS AI · Mar 167/10

🧠

Learnable Koopman-Enhanced Transformer-Based Time Series Forecasting with Spectral Control

Researchers propose a new family of learnable Koopman operators that combine linear dynamical systems theory with deep learning for time series forecasting. The approach integrates with existing transformer architectures like Patchtst and Autoformer, offering improved stability and interpretability in predictive models.

AIBullisharXiv – CS AI · Mar 167/10

🧠

Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation Analysis

Researchers used mechanistic interpretability techniques to demonstrate that transformer language models have distinct but interacting neural circuits for recall (retrieving memorized facts) and reasoning (multi-step inference). Through controlled experiments on Qwen and LLaMA models, they showed that disabling specific circuits can selectively impair one ability while leaving the other intact.

AIBullisharXiv – CS AI · Mar 167/10

🧠

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages

Researchers developed a new reinforcement learning approach for training diffusion language models that uses entropy-guided step selection and stepwise advantages to overcome challenges with sequence-level likelihood calculations. The method achieves state-of-the-art results on coding and logical reasoning benchmarks while being more computationally efficient than existing approaches.

AINeutralarXiv – CS AI · Mar 167/10

🧠

HCP-DCNet: A Hierarchical Causal Primitive Dynamic Composition Network for Self-Improving Causal Understanding

Researchers introduce HCP-DCNet, a new AI framework that combines physical dynamics with symbolic causal reasoning to enable AI systems to understand cause-and-effect relationships. The system uses hierarchical causal primitives and can self-improve through interventions, potentially addressing current limitations in AI's ability to handle distribution shifts and counterfactual reasoning.

AIBullisharXiv – CS AI · Mar 167/10

🧠

Active Causal Structure Learning with Latent Variables: Towards Learning to Detour in Autonomous Robots

Researchers propose Active Causal Structure Learning with Latent Variables (ACSLWL) as a necessary component for building AGI agents and robots. The paper demonstrates how this approach enables simulated robots to learn complex detour behaviors when encountering unexpected obstacles, allowing them to adapt to new environments by constructing internal causal models.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Quantifying Hallucinations in Language Language Models on Medical Textbooks

Research study finds that LLaMA-70B-Instruct hallucinated in 19.7% of medical Q&A responses despite high plausibility scores, highlighting significant reliability issues in AI healthcare applications. The study shows that lower hallucination rates correlate with higher usefulness scores, emphasizing the need for better safeguards in medical AI systems.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety

A large-scale study of 62,808 AI safety evaluations across six frontier models reveals that deployment scaffolding architectures can significantly impact measured safety, with map-reduce scaffolding degrading safety performance. The research found that evaluation format (multiple-choice vs open-ended) affects safety scores more than scaffold architecture itself, and safety rankings vary dramatically across different models and configurations.

← PrevPage 3 of 38Next →