y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#data-augmentation News & Analysis

28 articles tagged with #data-augmentation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

28 articles
AIBullisharXiv – CS AI · 1d ago7/10
🧠

Beyond Accuracy: Interpreting Topic Representation in Suicide Ideation Detection Models

Researchers demonstrate that suicide ideation detection models trained with topic-augmented datasets develop more interpretable internal representations of psychological risk factors. The study moves beyond standard accuracy metrics to examine how AI systems encode mental health concepts, revealing that augmentation clarifies underrepresented factors like immigration stress, family issues, and financial crisis.

AIBullisharXiv – CS AI · 5d ago7/10
🧠

Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation

Researchers demonstrate that synthetic fMRI data generated by TRIBE v2, a large pretrained encoding model, can significantly improve brain-to-image decoding performance in low-data scenarios, achieving up to 68% improvement in accuracy. The findings suggest that foundation models trained on extensive neural data can enhance data efficiency for brain decoding tasks and enable zero-shot capabilities.

AIBullisharXiv – CS AI · May 127/10
🧠

CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators

Researchers introduce CauSim, a framework that enables large language models to improve causal reasoning by constructing increasingly complex executable causal simulators. The approach transforms causal reasoning from a scarce-data problem into a scalable supervised learning task, allowing LLMs to generate synthetic training data and demonstrate improved performance across different representations.

AIBullisharXiv – CS AI · Apr 207/10
🧠

Large Language Models for Market Research: A Data-augmentation Approach

Researchers propose a novel statistical framework for integrating Large Language Model-generated data with real human data in conjoint analysis, addressing the bias gap between synthetic and authentic consumer responses. The approach delivers 24.9-79.8% cost and data savings while maintaining statistical robustness, validating that LLM data serves as a complement rather than substitute for human market research.

AINeutralarXiv – CS AI · 1d ago6/10
🧠

SNR-ST-Mix: Sample-specific Neighborhood Regression Mixup for Augmented Spatial Transcriptomics Imputation with Deep Neural Network

Researchers introduce SNR-ST-Mix, a data augmentation framework designed specifically for spatial transcriptomics that uses geometry-aware and expression-aware mixing to improve deep neural network performance. The method constrains data interpolation to k-nearest spatial neighbors and weights coefficients by expression similarity, enabling more biologically plausible synthetic training samples that enhance prediction accuracy without architectural changes.

AIBullisharXiv – CS AI · 1d ago6/10
🧠

Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision

Researchers introduce a novel anomaly detection framework combining visual prompting, unfrozen teacher models, and diffusion-based data augmentation to address real-world limitations in industrial inspection systems. The approach achieves a 3.5 percentage point improvement on the challenging AeBAD dataset, demonstrating practical applicability beyond controlled laboratory conditions.

AIBullisharXiv – CS AI · 1d ago6/10
🧠

Large Language Models for Imbalanced Classification: Diversity makes the difference

Researchers have developed a novel LLM-based oversampling method to address imbalanced classification in machine learning, focusing on generating diverse synthetic minority samples. The approach outperforms existing methods like SMOTE by preserving categorical information and introducing enhanced diversity through novel sampling and fine-tuning strategies.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Impact of Synthetic Lesional MR Images in Automated Focal Cortical Dysplasia Detection in Low-Data Scenarios

Researchers demonstrate that synthetic MRI images generated by conditional neural networks can effectively augment training datasets for automated focal cortical dysplasia detection, reducing the need for manual annotations by approximately 20% while maintaining diagnostic sensitivity. Expert radiologists struggled to distinguish synthetic from real images, validating the realism of generated data, though real data remains superior when available.

AIBullisharXiv – CS AI · 5d ago6/10
🧠

Binary Gaussian Copula Synthesis: an LLM-powered data augmentation framework for early dialysis prediction in chronic kidney disease

Researchers developed Binary Gaussian Copula Synthesis (BGCS), an LLM-augmented data augmentation method that addresses severe class imbalance in chronic kidney disease datasets to improve early dialysis prediction. Tested on 15,169 CKD patients, BGCS outperformed existing methods like SMOTE and CTGAN, achieving 78-87% minority-class recall and enabling deployment in interpretable clinical decision-support systems.

AINeutralarXiv – CS AI · 6d ago6/10
🧠

OA-CutMix: Correcting the Label Bias of CutMix

Researchers propose Object-Aware CutMix (OA-CutMix), a corrected version of the widely-used CutMix data augmentation technique that fixes a fundamental labeling bias where patch area doesn't accurately reflect semantic contribution. The method uses segmentation masks to assign labels proportional to visible object area, consistently outperforming existing mixing methods across multiple architectures and datasets.

AIBullisharXiv – CS AI · Jun 26/10
🧠

A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis

Researchers propose a unified deep learning framework combining ResNet-based CNNs with attention mechanisms and novel data augmentation techniques for analyzing biomedical time-series signals like ECG and EEG. The approach achieves near-perfect accuracy (99.78-100%) on benchmark datasets while remaining lightweight enough for wearable deployment, addressing critical gaps in multi-signal analysis and class imbalance handling.

AINeutralarXiv – CS AI · Jun 26/10
🧠

Synthetic Data from Cross-Domain Events for Large-Scale Recommendation Systems

Researchers introduce SCALR, a framework that generates synthetic user-item interaction data across recommendation system domains by leveraging observed events from source domains. The approach addresses data sparsity challenges in large-scale recommendation systems and demonstrates statistically significant improvements in industrial A/B testing.

AIBullisharXiv – CS AI · Jun 16/10
🧠

Controllable Lung Nodule Synthesis via Histogram-Regularized Latent Diffusion Models

Researchers propose a histogram-regularized latent diffusion model that synthesizes realistic lung nodules in 3D CT volumes while accurately preserving intensity distributions characteristic of different nodule subtypes. The method addresses limitations in existing generative approaches by constraining lesion-level intensity profiles during synthesis, enabling improved data augmentation for cancer screening systems and better performance on underrepresented nodule types.

AIBullisharXiv – CS AI · May 296/10
🧠

GiPL: Generative augmented iterative Pseudo-Labeling for Cross-Domain Few-Shot Object Detection

Researchers propose GiPL, a two-branch machine learning framework that combines iterative pseudo-labeling with generative data augmentation to improve cross-domain few-shot object detection using vision-language models. The method demonstrates significant performance improvements on three benchmark datasets, addressing critical challenges in fine-tuning with limited target-domain samples.

AINeutralarXiv – CS AI · May 296/10
🧠

A Survey on Recent Advances in Conversational Data Generation

A comprehensive survey examines recent advances in synthetic dialogue data generation for conversational AI systems, addressing the challenge of data scarcity in training. The research categorizes methods across open-domain, task-oriented, and information-seeking dialogue systems, proposing a framework for generating multi-turn conversations at scale while maintaining quality standards.

AIBullisharXiv – CS AI · May 296/10
🧠

Taming Data Challenges in ML-based Security Tasks Using Generative AI

Researchers propose using Generative AI to augment training datasets with synthetic data, improving machine learning security classifiers by up to 32.6% even with minimal training samples. The study evaluates six state-of-the-art GenAI methods across seven security tasks and introduces Nimai, a novel controlled data synthesis scheme, while identifying limitations in GenAI applicability to certain security domains.

AIBullisharXiv – CS AI · May 286/10
🧠

SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction

Researchers propose SSDAU, a novel data augmentation method for Joint Entity and Relation Extraction that preserves semantic structure and context awareness. The approach significantly outperforms existing methods by reducing F1 score degradation to 8.26% compared to 31.91% for baseline approaches, addressing a critical challenge in NLP model generalization.

AINeutralarXiv – CS AI · May 276/10
🧠

Personalized Generative Models for Contextual Debiasing

Researchers introduce DecoupleGen, a method that uses personalized text-to-image diffusion models to generate training data featuring objects in rare contextual scenarios. This approach addresses a critical limitation in computer vision models that perform better on common object-context combinations, potentially improving recognition accuracy for edge cases without requiring expensive real-world data collection.

AINeutralarXiv – CS AI · May 126/10
🧠

Geometrically Constrained Stenosis Editing in Coronary Angiography via Entropic Optimal Transport

Researchers have developed OT-Bridge Editor, an AI method that uses optimal transport theory to synthesize realistic coronary angiography images with artificial stenosis lesions. The technique achieves 27.8% improvement in stenosis detection performance on benchmark datasets, addressing the critical shortage of high-quality medical imaging training data.

AINeutralarXiv – CS AI · May 126/10
🧠

AtteConDA: Attention-Based Conflict Suppression in Multi-Condition Diffusion Models and Synthetic Data Augmentation

Researchers introduce AtteConDA, a novel approach to multi-condition image generation that resolves conflicts between simultaneous conditions (segmentation, depth, edges) to improve synthetic data quality for autonomous driving. The method enables more reliable data augmentation while preserving detailed scene structure, addressing critical data scarcity challenges in high-level driving task recognition.

AIBullisharXiv – CS AI · Mar 116/10
🧠

Grounding Synthetic Data Generation With Vision and Language Models

Researchers introduce ARAS400k, a large-scale remote sensing dataset containing 400k images (100k real, 300k synthetic) with segmentation maps and descriptions. The study demonstrates that combining real and synthetic data consistently outperforms training on real data alone for semantic segmentation and image captioning tasks.

AIBearisharXiv – CS AI · Mar 36/106
🧠

LangGap: Diagnosing and Closing the Language Gap in Vision-Language-Action Models

Researchers reveal that state-of-the-art Vision-Language-Action (VLA) models largely ignore language instructions despite achieving 95% success on standard benchmarks. The new LangGap benchmark exposes significant language understanding deficits, with targeted data augmentation only partially addressing the fundamental challenge of diverse instruction comprehension.

AIBullisharXiv – CS AI · Mar 36/104
🧠

Augmenting Research Ideation with Data: An Empirical Investigation in Social Science

Researchers developed a framework that improves AI-generated research ideas by incorporating relevant data during the ideation process. The approach increased idea feasibility by 20% and overall quality by 7%, with human studies confirming that data-augmented AI assistance helps researchers generate higher-quality ideas.

AINeutralarXiv – CS AI · Mar 26/1019
🧠

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation

Researchers developed BRIDGE, a framework to reduce bias in AI-powered automated scoring systems that unfairly penalize English Language Learners (ELLs). The system addresses representation bias by generating synthetic high-scoring ELL samples, achieving fairness improvements comparable to using additional human data while maintaining overall performance.

Page 1 of 2Next →