AIBearisharXiv – CS AI · 6h ago7/10
🧠
C3-Bench: A Context-Aware Change Captioning Benchmark
Researchers introduce C3-Bench, a comprehensive benchmark for evaluating change captioning AI systems across 51 real-world contexts with 4,996 labeled image pairs. Testing 32 models reveals that even state-of-the-art systems like GPT-5.2 fail systematically when facing unfamiliar change contexts, exposing a critical gap between lab performance and real-world reliability.
🧠 GPT-5