Snippet-Driven Supply Chain Discovery with LLMs: Scaling Visibility in China
Researchers propose a snippet-driven method using large language models to construct supply chain knowledge graphs for Chinese firms, achieving 7.2× greater coverage than traditional disclosure databases while reducing computational costs by 251× compared to full-text processing.
This research addresses a critical gap in supply chain visibility for Chinese firms, where unlisted companies and minor business relationships remain largely invisible in structured financial databases. Traditional approaches rely on mandatory disclosures from listed firms and expensive full-text web mining, leaving significant portions of inter-firm networks unmapped. The proposed methodology leverages web search snippets—the brief summaries returned alongside search results—as an efficient first-pass layer for LLM-based relationship extraction, substantially reducing token consumption and processing costs. The approach demonstrates that snippet-driven extraction, while discovering fewer total relationships than exhaustive full-text analysis, achieves superior efficiency metrics and practical scalability across 130,685 Chinese firms. For the listed-firm subset, the resulting supply chain knowledge graph covers 7.2× more firms and 9.3× more relationships than CSMAR disclosure benchmarks, revealing previously hidden heavy-tailed network patterns. The retained provenance metadata enables auditability, creating a trustworthy complement to traditional databases rather than a replacement. This methodological advance matters significantly for financial researchers, risk analysts, and investors seeking supply chain exposure assessment in opaque markets. The technique particularly benefits those evaluating geopolitical or sanctions-related supply chain vulnerabilities, where visibility into indirect relationships proves crucial. As LLM costs decline and efficiency improves, snippet-driven approaches could become standard for mapping complex economic networks in regions with limited disclosure requirements.
- →Snippet-driven LLM method reduces token consumption by 251× versus full-text processing while maintaining practical coverage gains
- →Supply chain knowledge graph reveals 7.2× more firms and 9.3× more relationships than official Chinese disclosure databases
- →Method demonstrates scalability across 130,685 Chinese firms, covering both listed and major unlisted companies as of 2024
- →Web search snippets enable efficient relationship extraction at scale without expensive full-text page processing
- →Auditable provenance metadata creates trustworthy complement to traditional disclosure-based supply chain databases