Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset
Researchers introduce CFMME, a Chinese financial multimodal evaluation benchmark containing 6,052 instances to assess Large Vision-Language Models' capabilities in financial contexts. Testing shows current state-of-the-art LVLMs achieve 66.11% accuracy on financial question-answering tasks, indicating significant room for improvement in applying these models to real-world financial applications.