#self-supervised-learning News & Analysis

84 articles tagged with #self-supervised-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

84 articles

AINeutralarXiv – CS AI · Jun 196/10

🧠

A Deep Generative Model for Resting-State EEG Synthesis and Transferable Representation Learning

REST-GAN introduces a generative adversarial network framework for synthesizing resting-state EEG signals while learning transferable representations without manual feature engineering. The model demonstrates strong performance in reproducing key EEG properties and outperforms direct raw-signal approaches on demographic classification tasks, offering a computationally efficient alternative to existing EEG analysis methods.

AINeutralarXiv – CS AI · Jun 196/10

🧠

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

Researchers introduced BrainG3N, a dual-purpose tokenizer combining a masked autoencoder encoder with a CNN decoder to generate clinically informative 3D brain MRI images. Pretrained on over 35,000 volumes across multiple disease categories and acquisition sites, the model simultaneously excels at downstream clinical tasks and enables controllable, conditional medical image generation.

AINeutralarXiv – CS AI · Jun 195/10

🧠

A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

Researchers developed improved Automatic Speech Recognition (ASR) models for Quranic recitation using pretrained Transformer architectures (Wav2Vec2.0, HuBERT, XLS-R), achieving 8% word error rates compared to 16.3% baseline performance. The study demonstrates that domain-specific fine-tuning with 870+ hours of professional and user-recited Quranic audio, combined with Arabic text without diacritics, significantly enhances transcription accuracy while reducing training time by 71%.

AINeutralarXiv – CS AI · Jun 196/10

🧠

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Researchers introduce Adaptive Binning, a self-supervised learning method for medical tabular data that dynamically adjusts feature discretization during training rather than using fixed global quantization. The approach combines curriculum learning with representation-aware binning to improve performance on unlabeled clinical datasets, alongside a new standardized benchmark for medical tabular SSL evaluation.

AINeutralarXiv – CS AI · Jun 196/10

🧠

SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models

Researchers introduce SL-S4Wave, a self-supervised learning framework combining contrastive learning with structured state space models to analyze physiological waveforms like ECGs and EEGs. The approach outperforms existing methods in detecting arrhythmias, requires fewer labeled examples, and generalizes effectively across different cardiac conditions and brain signals.

AIBullisharXiv – CS AI · Jun 196/10

🧠

HilDA: Hierarchical Distillation with Diffusion for Advancing Self-Supervised LiDAR Pre-trainin

HilDA introduces a self-supervised pretraining framework for LiDAR systems in autonomous driving by combining hierarchical knowledge distillation from Vision Foundation Models with diffusion-based temporal consistency. The approach achieves state-of-the-art results on cross-modal distillation benchmarks and improves performance across 3D object detection, scene flow, and semantic occupancy prediction tasks.

AIBullisharXiv – CS AI · Jun 116/10

🧠

Pretrained self-supervised speech models can recognize unseen consonants

Researchers demonstrate that pretrained self-supervised speech models (Wav2Vec2 and HuBERT) can accurately recognize click consonants from low-resource Khoisan languages despite training data heavily skewed toward high-resource languages. Fine-tuning on click-rich language data reveals these models generalize better to rare phonemes than expected, suggesting self-supervision creates robust representations across diverse human speech sounds.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Automated Pronunciation Evaluation for Korean Toddler Speech using Speech Diarization and Self-Supervised Learning

Researchers have developed an automated system for evaluating Korean toddler pronunciation using speaker diarization and self-supervised learning models, addressing a significant gap in speech assessment tools for this demographic. The system achieved balanced accuracies of 0.720 for consonants and 0.845 for vowels by routing predictions through specialized SSL models, offering potential clinical applications for detecting speech sound disorders affecting nearly half of Korean pediatric cases.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Towards Robust Arabic Speech Emotion Recognition with Deep Learning

Researchers propose a CNN-Transformer hybrid architecture for Arabic Speech Emotion Recognition that achieves 98.1% accuracy, outperforming CNN-LSTM and fine-tuned wav2vec 2.0 models. The study addresses the underexplored challenge of emotion detection in Arabic speech by combining convolutional feature extraction with Transformer-based context modeling, demonstrating effectiveness in low-resource, dialectally diverse settings.

AINeutralarXiv – CS AI · Jun 106/10

🧠

CleanPatrick: A Benchmark for Image Data Cleaning

CleanPatrick introduces the first large-scale benchmark for image data cleaning, built on a dermatology dataset with nearly 500,000 human annotations identifying data quality issues like duplicates, off-topic samples, and label errors. The benchmark formalizes data cleaning as a ranking task and evaluates existing detection methods, revealing that self-supervised models excel at near-duplicate detection while traditional anomaly detectors remain competitive for constrained review scenarios.

AINeutralarXiv – CS AI · Jun 96/10

🧠

SlideCheck: Guiding Self-Supervised Pretraining of Pathology Foundation Models via Dataset Distributions

Researchers introduce SlideCheck, a data guidance tool for pathology foundation models that uses frozen model features to score and curate pretraining datasets. The system provides abnormality and malignancy scores to help organize and audit WSI-derived patch data, demonstrating that controlled dataset composition significantly influences downstream self-supervised learning outcomes.

AINeutralarXiv – CS AI · Jun 96/10

🧠

What Makes Video World Model Latents Action-Relevant: Prediction over Reconstruction

Researchers demonstrate that temporal video pretraining, not pixel reconstruction quality, drives action-relevant structure in video world model latent spaces. Across diverse encoder architectures, video-pretrained self-supervised models consistently outperform reconstruction-based approaches in recovering action information, with implications for developing more effective embodied AI systems.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Self-Supervised Vision Transformers for CBCT-Based Detection of Temporomandibular Joint Osteoarthritis

Researchers demonstrate that self-supervised Vision Transformers, particularly the DINO family, can effectively detect temporomandibular joint osteoarthritis from cone-beam CT scans with 90.2% AUC when partially adapted. The study shows that strategic backbone unfreezing of final transformer blocks outperforms fully frozen models and supervised baselines, providing practical guidance for deploying foundation models in medical imaging with limited training data.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Geometric Second-Order Feature Correlation Learning for Self-Supervised Speech Emotion Recognition

Researchers propose a Second-Order Correlation (SOC) layer that improves speech emotion recognition by modeling feature correlations as covariance descriptors rather than treating features independently. Using Log-Euclidean mapping to preserve geometric properties, the method demonstrates superior performance on standard emotion recognition datasets compared to conventional first-order aggregation approaches.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Multilingual Multi-Speaker Unit Vocoders: A Systematic Analysis of Discrete Speech Representations

Researchers analyze how discrete speech units derived from self-supervised learning entangle phonetic, speaker, and language information in multilingual vocoder systems. The study demonstrates that cluster size directly controls intelligibility while explicit speaker conditioning prevents identity collapse, with implications for improving Audio LLMs and speech generation systems.

AINeutralarXiv – CS AI · Jun 46/10

🧠

The Variance Brain Foundation Models Forgot: Third-Order Statistics Predict Cognition Where Billion-Parameter Models Fail

Researchers demonstrate that brain foundation models (BFMs)—billion-parameter Transformers trained on fMRI data—paradoxically predict cognitive performance worse than simple linear regression on functional connectivity matrices. The study identifies a variance allocation problem where BFM pretraining captures dominant fMRI variance but destroys higher-order statistical structures (third-order co-skewness) that actually predict cognition, solved through a lightweight linear pipeline requiring no pretraining.

AINeutralarXiv – CS AI · Jun 46/10

🧠

The Loss Is Not Enough: Sampling Conditions and Inductive Bias in Contrastive Representation Learning

Researchers develop a theoretical framework proving that contrastive learning—a dominant self-supervised AI technique—requires specific sampling diversity conditions to recover meaningful latent geometry. They demonstrate that standard approaches can learn non-orthogonal representations and propose a corrected InfoNCE variant, with experiments showing that architectural inductive bias becomes critical when sampling diversity is limited.

AINeutralarXiv – CS AI · Jun 26/10

🧠

When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE

Researchers propose WEINCE, a modification to InfoNCE contrastive learning that corrects statistical misalignments in how softmax selects top-scoring examples using extreme value theory. The method adds anchor-wise batch statistics without trainable parameters and demonstrates consistent improvements across vision benchmarks.

AINeutralarXiv – CS AI · Jun 25/10

🧠

Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction

Researchers propose an auxiliary reconstruction module to improve encoder representations in neural algorithmic reasoning systems. By forcing encoders to reconstruct input states and capture feature dependencies, the method enhances the performance of existing neural architectures on algorithmic reasoning benchmarks.

AIBullisharXiv – CS AI · Jun 26/10

🧠

UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures

Researchers introduce UR-JEPA, a novel regularization technique for Joint-Embedding Predictive Architectures that addresses representation collapse by targeting uniformly rectifiable measures rather than isotropic Gaussians. The method demonstrates superior performance on Inet10 with an 0.83 percentage-point gain over existing approaches and produces geometrically distinct embeddings with sharper spectral drops, suggesting more structured learned representations.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Latent Priors

Researchers propose a self-supervised framework for monocular depth and pose estimation in endoscopy using a Generative Latent Bank and VAE to improve 3D mapping of the gastrointestinal tract. The method achieves superior performance over existing self-supervised approaches on standard endoscopic datasets without requiring synthetic training data.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Med-Scout: Curing MLLMs' Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training

Researchers introduce Med-Scout, a reinforcement learning framework that addresses a critical flaw in multimodal large language models (MLLMs) used for medical diagnosis: geometric blindness, or the inability to ground outputs in objective spatial constraints. The system uses unlabeled medical images with three proxy tasks to derive supervision signals, achieving 40% performance improvements on a new Med-Scout-Bench benchmark while generalizing to broader medical understanding tasks.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Task-Aligned Self-Supervised Learning for Medical Image Analysis: A Systematic Review and Practical Design Guidelines

A systematic review of self-supervised learning (SSL) in medical imaging analyzes 75 studies to establish that SSL effectiveness depends on alignment between pretext task design, imaging modality, and clinical objectives. The research provides practical guidelines showing contrastive methods excel at classification while generative approaches better support segmentation, with no universal optimal strategy.

AINeutralarXiv – CS AI · Jun 16/10

🧠

AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing

AnchorSteer is a new AI framework for music editing that maintains rhythmic and melodic structure while allowing semantic modifications through self-discovered concept vectors injected into diffusion models. The approach addresses a core tension in music AI: steering methods that enable high-level edits typically degrade structural integrity, while protective mechanisms suppress semantic control.

AINeutralarXiv – CS AI · Jun 16/10

🧠

STEP: Learning STructured Embeddings for Progressive Time Series

Researchers introduce STEP, a self-supervised learning method that creates interpretable representations of time series data showing irreversible state transitions like equipment degradation or task completion. The approach encodes progression information in geometric coordinates (polar angles and radius) without requiring labeled data, matching or exceeding black-box models while providing transparency into underlying mechanisms.

← PrevPage 2 of 4Next →