y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Beyond Text and Tables: Vision-Language Model Integration in ComProScanner for Extracting Materials Data from Scientific Figures with High Accuracy

arXiv – CS AI|Aritra Roy, Enrico Grisan, Chiara Gattinoni, John Buckeridge|
🤖AI Summary

Researchers have extended ComProScanner, an automated materials data extraction framework, with vision-language model capabilities to extract composition-property data from scientific figures in addition to text and tables. Gemini-3-Flash-Preview achieved 97% composition accuracy on piezoelectric ceramic research, establishing the first fully multimodal literature mining platform for materials science.

Analysis

ComProScanner's integration of vision-language models addresses a critical gap in scientific data extraction. Historically, automated pipelines focused on text and tabular data, missing quantitative information embedded in figures—a substantial portion of published materials science findings. This expansion matters because materials research discovery relies on synthesizing composition-property relationships across thousands of papers, and manual extraction remains time-intensive and error-prone.

The technical advancement demonstrates VLMs' practical utility beyond conversational applications. By evaluating four models against cost and accuracy metrics, the research identifies Gemini-3-Flash-Preview as optimal for this use case, achieving 97% composition accuracy while remaining cost-efficient at sub-$1.50 per million tokens. The introduction of range-based value error thresholds reflects domain-specific evaluation needs—exact numeric matching fails to capture physical meaningfulness in materials properties, a nuance critical for scientific validity.

For materials science researchers and database curators, this represents operational value. Automated multimodal extraction accelerates literature mining workflows, enabling faster construction of composition-property databases that fuel machine learning models in materials discovery. The unified pipeline approach reduces fragmentation in data collection processes.

However, impact remains largely confined to specialized materials research communities rather than broader markets. The framework's success on piezoelectric ceramics requires validation across diverse material classes and publishing formats. Future development should address extraction from complex multi-panel figures and verify generalization beyond the test corpus. Integration with existing materials databases and accessibility through open APIs would determine real-world adoption rates.

Key Takeaways
  • ComProScanner now extracts composition-property data from scientific figures using vision-language models, completing multimodal literature mining capabilities.
  • Gemini-3-Flash-Preview achieved 97% accuracy on piezoelectric ceramic research while offering the best cost-performance ratio among evaluated models.
  • Range-based value error thresholds provide more scientifically meaningful evaluation than exact numeric matching for extracted property values.
  • This represents the first materials-specific fully automated platform integrating text, tabular, and figure-based data extraction in a single pipeline.
  • Real-world impact depends on validation across diverse material classes and integration with existing materials databases.
Mentioned in AI
Models
GeminiGoogle
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles