y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#autonomous-research News & Analysis

6 articles tagged with #autonomous-research. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles
AIBearisharXiv – CS AI · 5h ago7/10
🧠

Can Coding Agents Reproduce Findings in Computational Materials Science?

Researchers introduced AutoMat, a benchmark testing whether AI coding agents can reproduce computational materials science findings from academic papers. Current LLM-based agents achieved only 54.1% success rates, revealing significant limitations in reconstructing complex scientific workflows, interpreting domain-specific procedures, and validating results against original claims.

AIBullisharXiv – CS AI · Apr 157/10
🧠

Towards grounded autonomous research: an end-to-end LLM mini research loop on published computational physics

Researchers demonstrate an autonomous LLM agent capable of executing a complete research loop—reading, reproducing, critiquing, and extending computational physics papers. Testing across 111 papers reveals the agent identifies substantive flaws in 42% of cases, with 97.7% of issues requiring actual computation to detect, and produces a publishable peer-review comment on a Nature Communications paper without human direction.

AIBullisharXiv – CS AI · Apr 137/10
🧠

AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs

AlphaLab is an autonomous research system using frontier LLMs to automate experimental cycles across computational domains. Without human intervention, it explores datasets, validates frameworks, and runs large-scale experiments while accumulating domain knowledge—achieving 4.4x speedups in CUDA optimization, 22% lower validation loss in LLM pretraining, and 23-25% improvements in traffic forecasting.

🧠 GPT-5🧠 Claude🧠 Opus
AIBullisharXiv – CS AI · Mar 37/102
🧠

The FM Agent

Researchers have developed FM Agent, a multi-agent AI framework that combines large language models with evolutionary search to autonomously solve complex research problems. The system achieved state-of-the-art results across multiple domains including operations research, machine learning, and GPU optimization without human intervention.

AINeutralarXiv – CS AI · Apr 146/10
🧠

LABBench2: An Improved Benchmark for AI Systems Performing Biology Research

Researchers have released LABBench2, an upgraded benchmark with nearly 1,900 tasks designed to measure AI systems' real-world capabilities in biology research beyond theoretical knowledge. The new benchmark shows current frontier models achieve 26-46% lower accuracy than on the original LAB-Bench, indicating significant progress in AI scientific abilities while highlighting substantial room for improvement.

$OP🏢 Hugging Face