y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#dataset-creation News & Analysis

3 articles tagged with #dataset-creation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AIBullisharXiv – CS AI · May 297/10
🧠

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

Researchers introduce GTA, a scalable framework for automatically generating realistic web agent tasks paired with executable trajectories at scale. The system addresses critical limitations in existing benchmarks by combining crawling, retrieval-based seeding, and automated quality control to create multi-hop, cross-page tasks across 50+ websites, revealing significant performance gaps between human and AI agents.

AINeutralarXiv – CS AI · May 96/10
🧠

HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities

Researchers introduce Hard Negative Captions (HNC), an automatically generated dataset designed to improve vision-language models' ability to understand fine-grained mismatches between images and text. The work addresses a fundamental limitation in current image-text matching approaches, where weakly paired web data fails to teach models detailed cross-modal comprehension, demonstrating improved performance on diagnostic tasks and robustness under noisy conditions.