#api-agents News & Analysis

2 articles tagged with #api-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AINeutralarXiv – CS AI · Jun 126/10

🧠

GeoNatureAgent Benchmark: Benchmarking LLM Agents for Environmental Geospatial Analysis Across Frontier and Open-Weight Foundation Models

Researchers introduced GeoNatureAgent Benchmark, the first evaluation framework for AI agents performing environmental geospatial analysis through real API interactions. Testing seven major LLMs across 93 tasks, Claude Sonnet 4 achieved 60.8% accuracy while DeepSeek V3.2 delivered 93% of Claude's capability at 11x lower cost, revealing significant performance gaps in structured reasoning tasks.

🧠 Claude🧠 Sonnet🧠 Gemini

AINeutralarXiv – CS AI · May 125/10

🧠

Trajectory Supervision for Continual Tool-Use Learning in LLMs

Researchers demonstrate that preserving API request/response trajectories during continual learning significantly improves tool-use performance in language models. Fine-tuning Llama 3.1 8B on sequential API domains shows trajectory supervision achieves 56.9% accuracy versus 39.2% without intermediate context, though at a 25.1% token cost increase.

🧠 Llama