AIBullisharXiv โ CS AI ยท 1d ago6/10
๐ง
LensWalk: Agentic Video Understanding by Planning How You See in Videos
Researchers introduced LensWalk, an agentic AI framework that enables Large Language Models to actively control their visual observation of videos through dynamic temporal sampling. The system uses a reason-plan-observe loop to progressively gather evidence, achieving 5% accuracy improvements on challenging video benchmarks without requiring model fine-tuning.