#mllm-benchmark News & Analysis

2 articles tagged with #mllm-benchmark. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AINeutralarXiv – CS AI · Jun 46/10

🧠

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Researchers introduce ToxiMol, the first benchmark dataset and evaluation framework for assessing Multimodal Large Language Models (MLLMs) on molecular toxicity repair—the task of generating structurally valid alternatives to toxic compounds. Testing 43 mainstream MLLMs reveals current models show promise in toxicity understanding and constraint adherence but face significant challenges in this specialized pharmaceutical application.

AIBullisharXiv – CS AI · Jun 16/10

🧠

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

Researchers introduce MechVQA, the first comprehensive dataset for evaluating multimodal large language models (MLLMs) on mechanical drawing understanding, containing 3.3k annotated drawings with 21k question-answer pairs across three capability levels. They develop MechVL, a domain-specialized model that outperforms existing baselines by 7.57 percentage points, establishing a foundation for deploying AI in mechanical design and engineering inspection workflows.