ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents
Researchers propose Causal Minimal Tool Filtering (CMTF), a training-free method that improves LLM agent reliability by exposing only necessary tools at each step rather than entire tool menus. The approach reduces token usage by 90% and tool exposure from 100 to 1 per step while maintaining task success rates.
The challenge of scaling LLM agents lies not in building more capabilities but in managing complexity within existing ones. As tool menus expand, agents face a fundamental problem: semantic relevance doesn't guarantee necessity. A tool related to a user's goal may still introduce errors or inefficiency if exposed prematurely. CMTF addresses this through causal reasoning, using lightweight precondition-effect contracts to determine what tools actually advance the current state toward the objective.
This work emerges from a broader recognition that agent reliability depends on decision architecture as much as model capability. Previous approaches relied on keyword matching or state-aware filtering, which remain computationally expensive or incomplete. The causal approach represents a conceptual shift—moving from "what tools might help" to "what minimal tool is sufficient for the next step."
For developers building production AI agents, this has immediate practical implications. Reducing tool exposure from 100 to 1 per step dramatically lowers computational overhead and error surfaces. The 90% token reduction directly impacts deployment costs and latency, particularly important as agents move beyond research environments into real-world applications. Multi-step reasoning becomes more economical and reliable.
The benchmarking scope—102 tasks, 100 tools, four LLM backends, 2448 runs—demonstrates thoroughness absent in many AI research papers. CMTF matches stronger baselines while achieving efficiency gains, suggesting the approach doesn't sacrifice capability for speed. Future work will likely explore how this scales beyond the tested range and whether causal contracts can be automatically generated rather than manually defined.
- →CMTF reduces visible tools from 100 to 1 per step while maintaining task success rates comparable to stronger baselines
- →Token usage drops 90% relative to all-tools exposure, significantly lowering deployment costs for LLM agents
- →Causal sufficiency outperforms semantic relevance as a tool-selection criterion in multi-step reasoning tasks
- →Training-free method uses lightweight precondition-effect contracts to determine minimal necessary tool frontier
- →Comprehensive benchmark across 4 LLM backends and 102 tasks demonstrates broad applicability and robustness