PathISE: Learning Informative Path Supervision for Knowledge Graph Question Answering
PathISE is a novel framework that enables knowledge graph question-answering systems to learn effective supervision signals from answer-level labels alone, eliminating the need for expensive intermediate annotations. By using a transformer-based estimator to identify informative relation paths and distilling them into LLM path generators, the approach achieves competitive state-of-the-art performance while reducing resource requirements for training.
PathISE addresses a critical bottleneck in knowledge graph question-answering systems: the expensive process of obtaining high-quality intermediate supervision signals. Traditional KGQA approaches require manually annotated question-relevant paths or subgraphs, creating a labor-intensive data collection burden. This new framework circumvents that requirement by learning to estimate path informativeness directly from final answer labels, substantially reducing costs while maintaining performance quality.
The technical contribution centers on a lightweight transformer-based estimator that evaluates relation paths for their relevance to answering questions. Rather than relying on external human annotation or costly LLM-refined supervision, PathISE generates pseudo path-level supervision automatically. This distilled knowledge is then transferred into an LLM path generator, which produces compact evidence grounded in the knowledge graph for answer reasoning.
The broader significance lies in democratizing KGQA system development. By eliminating the need for expensive intermediate annotations, the framework reduces barriers to entry for organizations building question-answering systems over structured knowledge. The reusable supervision signals generated by PathISE can enhance existing KGQA models, creating multiplicative benefits across the ecosystem. Validation across three major KGQA benchmarks demonstrates competitive or superior performance compared to existing methods.
Looking forward, this approach suggests a broader trend toward efficient knowledge distillation in AI systems. As organizations increasingly rely on LLM-augmented reasoning over structured data, methods that reduce annotation costs while maintaining quality become critical infrastructure. The open-sourcing of PathISE code enables rapid adoption and iteration by the research community.
- βPathISE learns intermediate supervision from answer-level labels alone, eliminating expensive manual annotation requirements
- βA lightweight transformer estimator identifies informatative relation paths to create pseudo path-level supervision signals
- βThe framework achieves competitive or state-of-the-art performance on three KGQA benchmarks without LLM-refined supervision
- βGenerated supervision signals are reusable and can enhance existing KGQA models across applications
- βThe approach significantly reduces resource costs and barriers to entry for developing knowledge graph question-answering systems