#data-annotation News & Analysis

5 articles tagged with #data-annotation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBullisharXiv – CS AI · May 117/10

🧠

ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations

ForgeVLA introduces a federated learning framework that enables Vision-Language-Action models to train on distributed robot data without centralizing sensitive information or requiring manual language annotations. The system uses embodied instruction classifiers to automatically generate missing language labels and addresses vision-language feature collapse through contrastive learning and adaptive aggregation.

AINeutralarXiv – CS AI · 5d ago6/10

🧠

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

Researchers introduce a behavioral cloning framework for scientific data annotation that learns from expert annotation strategies rather than direct prediction. The study demonstrates that larger models trained on multiple annotation tasks develop hierarchical skills, generalize across tasks, and internally represent latent variables of the annotation process, offering a foundation for automating labor-intensive verification and correction workflows.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Structured Exploration and Exploitation of Label Functions for Automated Data Annotation

Researchers introduce EXPONA, an automated framework for generating label functions that improve weak label quality in machine learning datasets. The system balances exploration across surface, structural, and semantic levels with reliability filtering, achieving up to 98.9% label coverage and 46% downstream performance improvements across diverse classification tasks.

AINeutralarXiv – CS AI · Mar 96/10

🧠

The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation

A systematic literature review of 346 papers reveals critical flaws in AI data annotation practices, arguing that treating human disagreement as 'noise' rather than meaningful signal undermines model quality. The study proposes pluralistic annotation frameworks that embrace diverse human perspectives instead of forcing artificial consensus.

AINeutralarXiv – CS AI · Mar 165/10

🧠

BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning

Researchers introduce BoSS (Best-of-Strategies Selector), a new oracle strategy for active learning that outperforms existing methods by using an ensemble approach to select optimal data annotation batches. The study reveals that current state-of-the-art active learning strategies still significantly underperform compared to oracle performance, particularly on large-scale datasets.