MetaboT: An LLM-based Multi-Agent Frameworkfor Interactive Analysis of Mass SpectrometryMetabolomics Knowledge Graphs
MetaboT is an open-source LLM-based framework that translates natural-language questions into SPARQL queries for metabolomics knowledge graphs, significantly lowering technical barriers for researchers without programming expertise. The multi-agent architecture addresses hallucination and schema-compliance issues through specialized agents for validation, entity resolution, and query refinement, validated on the Experimental Natural Products Knowledge Graph.
MetaboT represents a meaningful advancement in making complex scientific data infrastructure accessible to domain experts lacking technical training. Mass spectrometry metabolomics generates voluminous, high-dimensional datasets that contain substantial biological insights, yet the steep learning curve of specialized query languages like SPARQL has traditionally gatekept this knowledge. The framework's multi-agent design elegantly solves a core AI problem: single large language models produce hallucinations and frequently violate database schema constraints, reducing reliability in scientific contexts where accuracy is paramount.
The architecture's modular approach—with separate agents handling scope validation, entity resolution against authoritative databases, schema-aware query generation, and iterative refinement—addresses real failure modes in naive LLM-to-query systems. This represents the growing maturation of LLM application patterns beyond simple conversational interfaces toward specialized technical problem-solving.
For the research community, MetaboT democratizes semantic data mining across metabolomics datasets, enabling plant biologists and natural products researchers to extract insights previously requiring bioinformaticians or computational chemists. The validation against the ENPKG and expert-authored benchmarks provides credibility beyond theoretical claims. This pattern of expert-curated validation is increasingly important as LLM applications move into specialized scientific domains where errors carry meaningful consequences.
The broader implication extends beyond metabolomics: this architecture offers a replicable template for translating natural language to domain-specific query languages across other knowledge graph applications in biology, chemistry, and pharmaceutical research. Success here could accelerate adoption of knowledge graph technologies in life sciences by removing the technical gatekeeping that has historically limited their user base.
- →MetaboT uses multi-agent LLM architecture to translate natural language into SPARQL queries, solving hallucination and schema-compliance problems in single-model approaches.
- →The framework enables metabolomics researchers without programming expertise to perform semantic data mining on knowledge graphs.
- →Validation on the Experimental Natural Products Knowledge Graph demonstrates capability for complex queries about plant-metabolite relationships and biological activities.
- →The modular agent design provides a replicable template for applying LLMs to other specialized query languages and scientific knowledge graphs.
- →Open-source release increases accessibility and potential for community-driven improvements in life sciences research workflows.