AINeutralarXiv – CS AI · 11h ago6/10
🧠
AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents
Researchers introduced AD-Bench, a real-world benchmark for evaluating LLM agents in advertising analytics tasks using actual production platform data. The framework addresses the gap between idealized benchmarks and practical agent performance, revealing that state-of-the-art models like Claude-Opus-4.7 struggle significantly with complex, multi-step advertising analytics despite achieving 76.9% accuracy on simpler tasks.
🧠 Claude