#multi-agent-framework News & Analysis

4 articles tagged with #multi-agent-framework. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles

AINeutralarXiv – CS AI · Jun 36/10

🧠

ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models

Researchers introduce ClinicalMC, a benchmark dataset designed to evaluate how large language models perform in complex, multi-stage clinical decision-making scenarios where patient conditions evolve over time. The benchmark includes 7,079 samples across English and Chinese datasets with a multi-agent evaluation framework, testing closed-source, open-source, and medical-specialized LLMs.

🧠 GPT-5

AINeutralarXiv – CS AI · May 296/10

🧠

SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation

Researchers introduce SafeRx-Agent, a multi-agent AI framework designed to improve medication recommendation systems by integrating clinical knowledge, safety verification, and explainability. The system addresses limitations in existing approaches by using fine-grained drug classification (ATC codes) and demonstrating improved accuracy while controlling for drug interactions and contraindications on MIMIC datasets.

AINeutralarXiv – CS AI · May 286/10

🧠

MAVEN A Multi-Agent Framework for Multicultural Text-to-Video Generation

Researchers introduce MAVEN, a multi-agent framework that improves text-to-video generation's ability to accurately represent multiple cultures within single prompts. The team contributes a new benchmark dataset of 243 culturally grounded prompts across Chinese, American, and Romanian cultures, demonstrating that specialized agent-based prompt refinement significantly enhances cultural fidelity while maintaining visual quality.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

Researchers introduce a multi-agent framework to map data lineage in large language models, revealing how post-training datasets evolve and interconnect. The analysis uncovers structural redundancy, benchmark contamination propagation, and proposes lineage-aware dataset construction to improve LLM training diversity and quality.