AINeutralarXiv โ CS AI ยท 10h ago7/10
๐ง
CCTU: A Benchmark for Tool Use under Complex Constraints
Researchers introduce CCTU, a new benchmark for evaluating large language models' ability to use tools under complex constraints. The study reveals that even state-of-the-art LLMs achieve less than 20% task completion rates when strict constraint adherence is required, with models violating constraints in over 50% of cases.