y0news
AnalyticsDigestsSourcesRSSAICrypto
#continuous-evolution1 article
1 articles
AIBearisharXiv โ€“ CS AI ยท 9h ago7/10
๐Ÿง 

EvoClaw: Evaluating AI Agents on Continuous Software Evolution

Researchers introduce EvoClaw, a new benchmark that evaluates AI agents on continuous software evolution rather than isolated coding tasks. The study reveals a critical performance drop from >80% on isolated tasks to at most 38% in continuous settings across 12 frontier models, highlighting AI agents' struggle with long-term software maintenance.