AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA
AgentFinVQA introduces a multi-agent AI system for financial chart analysis that prioritizes auditability and on-premise deployment alongside accuracy. The system decomposes queries into specialized steps and records all reasoning in traceable evaluation packets, achieving 7.68 percentage point improvements over baselines while maintaining 4.84 pp gains with open-source models.
AgentFinVQA addresses a critical gap in financial AI applications where regulatory compliance and data privacy are non-negotiable requirements. Traditional chart question-answering systems prioritize raw accuracy while often requiring cloud-based APIs and proprietary models, creating friction in regulated institutions that cannot outsource sensitive client data. This research demonstrates that specialized multi-agent architectures—decomposing tasks into planning, OCR, legend grounding, visual inspection, and verification—can simultaneously improve accuracy and auditability.
The system's performance gains are substantial: 71.24% accuracy with Gemini 3 Flash and 67.4% with locally-deployable Qwen 3.6-27B represent meaningful improvements over zero-shot baselines. More importantly, the Model Evaluation Packet framework creates a verifiable audit trail for each inference, addressing institutional requirements around model explainability and regulatory reporting. The verifier's confidence signal (68.2% vs 55.6% accuracy on confirmed vs revised answers) enables practical human-in-the-loop workflows.
The decision to release open-weights model results is particularly significant for enterprise adoption. Many financial institutions face architectural constraints or data governance policies that preclude proprietary API access. By demonstrating that locally-deployed open models retain most accuracy gains while enabling complete data residency, AgentFinVQA lowers barriers to AI adoption in compliance-heavy sectors.
Error analysis identifying question misunderstanding, legend confusion, and extraction errors as primary failure modes provides roadmap clarity for future development. These categories represent systematic weaknesses that the current verifier misses, suggesting opportunities for specialized verification components.
- →AgentFinVQA combines +7.68pp accuracy improvements with full auditability through traceable Model Evaluation Packets for regulated financial environments.
- →Open-weights Qwen model deployment achieves 67.4% accuracy with on-premise data residency, removing cloud-dependency barriers for institutional adoption.
- →Verifier confidence signals enable effective human-in-the-loop routing, improving exact accuracy on confirmed answers to 68.2%.
- →Multi-agent decomposition (planning, OCR, legend grounding, inspection, verification) proves more effective than single-model approaches for financial chart QA.
- →Question misunderstanding and legend confusion account for two-thirds of failures and represent next priorities for algorithmic improvement.