AINeutralarXiv – CS AI · 9h ago6/10
🧠
TensorBench: Benchmarking Coding Agents on a Compiler-Based Tensor Framework
Researchers introduced TensorBench, a 199-task benchmark for evaluating coding agents on a PyTorch-based tensor framework, addressing the trade-off between task difficulty and evaluation reliability in repository-level coding benchmarks. Testing seven frontier AI models revealed significant performance variation, with pass rates ranging from 64.8% to 22.1%, suggesting distinct strengths across different coding agent architectures.