Exploration Structure in LLM Agents for Multi-File Change Localization
Researchers compare linear versus non-linear exploration strategies for LLM agents tasked with localizing files requiring changes to resolve software issues. Domain-scoped parallel agent spawning with smaller models achieves competitive performance against larger models while reducing costs, revealing that repository exploration structure significantly impacts software engineering task efficiency.
This research addresses a fundamental architectural limitation in how AI agents navigate software repositories. Traditional sequential exploration—visiting one directory or file at a time—creates inefficiencies when resolving issues spanning multiple subsystems. The study proposes domain-scoped parallel agent spawning as an alternative, where specialized agents explore different repository domains simultaneously. Using the ansible project as a case study, the researchers benchmarked their approach against multiple baselines including larger Codex models and single-agent recursive language model implementations. The findings demonstrate that non-linear exploration with smaller Haiku-class models can match or exceed performance of larger models on expanded benchmarks covering recent GitHub issues. This represents significant progress in AI-driven code understanding, particularly regarding computational efficiency and cost-effectiveness in developer tools. The research identifies three critical challenges: documentation evolution creates latent dependencies that current approaches cannot resolve, naive file system access can introduce noise through over-prediction of test files, and multi-agent consultation strategies increase token costs without measurable performance gains. These limitations suggest that improving exploration structure requires balancing breadth of investigation with precision in file identification, rather than simply adding more agents or computational resources. The work has implications for development tool providers seeking to implement LLM-based code analysis at scale, as it demonstrates that thoughtful agent architecture matters as much as raw model capacity. The distinction between performance on curated benchmarks versus expanded 2025-2026 data reveals that domain-specific optimization strategies may have diminishing returns as codebases evolve.
- →Domain-scoped parallel agent exploration outperforms linear sequential exploration for multi-subsystem code changes
- →Smaller language models with optimized exploration structure compete with much larger models while reducing computational costs
- →Naive file system access degrades localization accuracy through test-file over-prediction bias
- →Documentation evolution remains an unresolved dependency that current LLM agent approaches cannot address
- →Multi-agent consultation strategies increase token expenditure without proportional performance improvements