Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems
Researchers benchmark token-optimized data formats (TRON and TOON) against JSON in agentic AI systems, finding TRON reduces token consumption by up to 27% with acceptable accuracy trade-offs. The study reveals that while these alternatives show promise in isolated tasks, their real-world performance in multi-turn agent loops exposes limitations, particularly with TOON's parsing cascades and parallel tool-call handling.
This benchmark study addresses a practical efficiency problem in agentic AI systems where token consumption directly impacts latency, cost, and model scalability. As LLMs increasingly orchestrate tool use through structured data exchanges, the overhead of JSON's human-readable syntax becomes a meaningful constraint. The research moves beyond theoretical token savings to test whether format optimizations actually work within realistic agent workflows across multiple models and benchmarks.
The findings present a nuanced picture. TRON achieves meaningful token reduction (up to 27%) while maintaining reasonable accuracy levels despite a 14 percentage-point drop from JSON baselines. This suggests viable cost optimization paths for production systems. However, TOON's vulnerability to parsing failures in multi-turn interactions and its inability to preserve parallel tool calls across most models indicates that format optimization alone cannot sacrifice structural robustness.
For the AI infrastructure industry, these results validate the growing focus on token efficiency as LLM deployments scale. The work highlights that format selection involves engineering trade-offs rather than simple adoption decisions. Developers building agentic systems must now consider whether the token savings justify potential accuracy losses and parsing brittleness.
The practical implication is that TRON emerges as a more viable alternative for systems where accuracy can tolerate 10-14pp degradation in exchange for 20-27% cost reduction. Organizations deploying high-volume agent systems could realize substantial savings, but format migration requires careful validation across their specific model portfolios and task distributions. Future work should address TOON's failure modes and develop hybrid approaches that preserve both efficiency and robustness.
- →TRON reduces token consumption up to 27% with 14pp accuracy cost, making it viable for cost-sensitive deployments
- →TOON achieves 18% token reduction but suffers from multi-turn parsing cascades and parallel tool-call collapse
- →Token optimization gains hold across multiple agentic benchmarks and five open-weight LLMs
- →Format selection requires careful trade-off analysis between efficiency gains and accuracy degradation
- →JSON alternatives remain context-dependent—TRON performs better than TOON for production agent systems