#dataset-creation News & Analysis

3 articles tagged with #dataset-creation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBullisharXiv – CS AI · May 297/10

🧠

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

Researchers introduce GTA, a scalable framework for automatically generating realistic web agent tasks paired with executable trajectories at scale. The system addresses critical limitations in existing benchmarks by combining crawling, retrieval-based seeding, and automated quality control to create multi-hop, cross-page tasks across 50+ websites, revealing significant performance gaps between human and AI agents.

AINeutralarXiv – CS AI · May 96/10

🧠

HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities

Researchers introduce Hard Negative Captions (HNC), an automatically generated dataset designed to improve vision-language models' ability to understand fine-grained mismatches between images and text. The work addresses a fundamental limitation in current image-text matching approaches, where weakly paired web data fails to teach models detailed cross-modal comprehension, demonstrating improved performance on diagnostic tasks and robustness under noisy conditions.

AINeutralHugging Face Blog · Dec 164/106

🧠

Introducing the Synthetic Data Generator - Build Datasets with Natural Language

The article title suggests the introduction of a synthetic data generator tool that allows users to build datasets using natural language commands. However, no article body content was provided for analysis.