y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#data-synthesis News & Analysis

7 articles tagged with #data-synthesis. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles
AIBearisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Researchers have identified 'preference leakage,' a contamination problem in LLM-as-a-judge systems where evaluator models show bias toward related data generator models. The study found this bias occurs when judge and generator LLMs share relationships like being the same model, having inheritance connections, or belonging to the same model family.

AIBullisharXiv โ€“ CS AI ยท Mar 47/102
๐Ÿง 

Efficient Agent Training for Computer Use

Researchers introduced PC Agent-E, an efficient AI agent training framework that achieves human-like computer use with minimal human demonstration data. Starting with just 312 human-annotated trajectories and augmenting them with Claude 3.7 Sonnet synthesis, the model achieved 141% relative improvement and outperformed Claude 3.7 Sonnet by 10% on WindowsAgentArena-V2 benchmark.

AIBullisharXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

Lightweight GenAI for Network Traffic Synthesis: Fidelity, Augmentation, and Classification

Researchers developed lightweight generative AI models for creating synthetic network traffic data to address privacy concerns and data scarcity in network traffic classification. The models achieved up to 87% F1-score when classifiers were trained solely on synthetic data, with transformer-based approaches providing the best balance of accuracy and computational efficiency.

AIBullisharXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis

Researchers introduce ArtiAgent, an automated system that creates pairs of real and artifact-injected images to help AI models better detect and fix visual artifacts in generated content. The system uses three specialized agents to synthesize 100K annotated images, addressing the costly and scaling challenges of human-labeled artifact datasets.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing

Researchers have developed RawMed, the first framework to generate synthetic multi-table time-series Electronic Health Records (EHR) that closely resembles raw medical data. The system addresses privacy concerns in healthcare data sharing while maintaining fidelity and utility, outperforming baseline models in validation tests.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1014
๐Ÿง 

MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs

Researchers introduce MMKG-RDS, a framework that uses multimodal knowledge graphs to synthesize high-quality training data for improving AI model reasoning abilities. Testing on Qwen3 models showed 9.2% improvement in reasoning accuracy, with applications for complex benchmark construction involving tables and formulas.