AINeutralarXiv โ CS AI ยท 5h ago6/10
๐ง
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction
InfantAgent-Next is a multimodal AI agent that combines tool-based and vision-based approaches in a modular architecture to interact with computers across text, images, audio, and video. The system achieves 7.27% accuracy on OSWorld benchmarks, outperforming Claude's Computer Use, and demonstrates broad applicability across vision-based and general benchmarks.
๐ง Claude