#self-supervised-learning News & Analysis

84 articles tagged with #self-supervised-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

84 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning

Researchers introduce ACT-JEPA, a machine learning architecture that combines imitation learning with self-supervised learning to improve policy representation in AI decision-making systems. The model achieves up to 40% improvement in world model understanding and 10% higher task success rates by jointly predicting action and latent observation sequences in latent space rather than raw input.

AIBullisharXiv – CS AI · Jun 237/10

🧠

SPOTR: Spatio-temporal Pooling One-Token Reconstruction for Universal Physiological Signal Self-supervised Learning

SPOTR, a new self-supervised learning framework, significantly advances physiological signal processing by using a single-token bottleneck to compress and reconstruct EEG, ECG, PPG, and iEEG signals. The model demonstrates substantial performance improvements across 20 datasets while reducing computational requirements by 78% in latency and 52% in GPU memory compared to existing foundation models.

AIBearisharXiv – CS AI · Jun 237/10

🧠

Backdoor Attacks on Speech Emotion Recognition via TTS-Generated Poisoning

Researchers demonstrate the first systematic study of poisoning-based backdoor attacks on Speech Emotion Recognition (SER) systems using text-to-speech generated audio. The study reveals that modern SER models can be reliably compromised with imperceptible acoustic triggers while maintaining normal performance on benign inputs, exposing critical vulnerabilities in AI systems that process voice data.

AIBullisharXiv – CS AI · Jun 117/10

🧠

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

Researchers propose a self-supervised reinforcement learning framework that improves large language models' spatial reasoning capabilities through consistency verification rather than labeled data. The approach, which uses geometric and semantic consistency checks across image and text transformations, achieves performance comparable to supervised fine-tuning without ground-truth annotations.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Next-Token Prediction Learns Generalisable Representations of Sleep Physiology

Researchers introduce Hypnos, a multi-modal foundation model trained on next-token prediction that learns generalizable representations of sleep physiology from over 20,000 polysomnography recordings across eight sensing modalities. The model achieves performance parity with supervised baselines on sleep stage classification while using 100× less labeled data and demonstrates cross-domain generalization by outperforming specialized models on daytime cardiac tasks.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Researchers introduce Retrospective Harness Optimization (RHO), a self-supervised method that enables AI agents to improve their capabilities using only historical trajectory data without requiring external validation sets. The approach improved performance on SWE-Bench Pro from 59% to 78% pass rate in a single optimization round, demonstrating practical effectiveness across software engineering, technical work, and knowledge domains.

AIBullisharXiv – CS AI · Jun 47/10

🧠

Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

Researchers propose FINO, a label-free method for adapting vision foundation models to specialized scientific domains using existing metadata rather than expensive labeled datasets. The approach combines self-supervised learning with metadata guidance, demonstrating superior performance across microscopy, Earth observation, and medical imaging compared to both unsupervised and fully supervised alternatives.

AIBullisharXiv – CS AI · Jun 27/10

🧠

CoilDrop-MRI: Self-supervised physics-guided MRI reconstruction with coil dropout

Researchers introduce CoilDrop-MRI, a self-supervised deep learning method that improves accelerated MRI reconstruction by strategically dropping data across receiver coils rather than only in k-space. Validated across multiple hospital sites and field strengths, the approach matches supervised methods' quality without requiring fully sampled training data, offering practical efficiency gains for medical imaging.

AIBullisharXiv – CS AI · Jun 27/10

🧠

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models

Researchers introduce DLLM-JEPA, a new self-supervised learning approach that combines Joint Embedding Predictive Architectures with masked-diffusion language models. The method eliminates the need for explicit multi-view training data and reduces computational costs by 33% compared to prior LLM-JEPA while achieving significant performance improvements across multiple benchmarks.

AIBullisharXiv – CS AI · Jun 27/10

🧠

MindZero: Learning Online Mental Reasoning With Zero Annotations

MindZero introduces a self-supervised reinforcement learning framework that trains multimodal large language models to perform robust Theory of Mind reasoning without requiring annotated mental state data. The approach combines model-based planning with neural scaling, achieving superior accuracy and efficiency compared to traditional model-based methods and LLMs alone.

AIBullisharXiv – CS AI · Jun 17/10

🧠

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

RayDer introduces a unified transformer architecture that consolidates camera estimation, scene reconstruction, and rendering into a single model for self-supervised novel view synthesis from real-world video. The system achieves clean power-law scaling with data and compute while maintaining competitive performance with supervised approaches, addressing a key scalability challenge in 3D vision.

AIBullisharXiv – CS AI · May 127/10

🧠

Event Fields: Learning Latent Event Structure for Waveform Foundation Models

Researchers introduce a novel waveform foundation model that represents physiological signals as latent event processes rather than sequential tokens, using self-supervised learning to capture clinically meaningful structure. The approach demonstrates improved performance on medical benchmarks including arrhythmia classification and hemodynamic prediction, suggesting event-centric representations may be more suitable for healthcare AI than traditional sequence-based methods.

AIBullisharXiv – CS AI · May 127/10

🧠

Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs

Researchers propose TPAW, a self-play algorithm that improves LLM alignment without human-labeled data by having models collaborate and compete against historical checkpoints while using adaptive weighting mechanisms. The approach addresses instability and diminishing optimization gains in existing self-training methods, demonstrating consistent improvements across multiple benchmarks.

AIBullisharXiv – CS AI · May 117/10

🧠

When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining

Researchers propose a gradient-based bilevel optimization method that automatically learns composite loss weights during pretraining by aligning gradients with downstream objectives. The approach reduces hyperparameter tuning overhead to ~30% above baseline training cost while matching or exceeding manually tuned baselines across event-sequence and computer vision tasks.

AIBullisharXiv – CS AI · May 117/10

🧠

Enabling Unsupervised Training of Deep EEG Denoisers With Intelligent Partitioning

Researchers propose Intelligent Partitioning for Self-supervised Denoising (iPSD), a deep learning method that eliminates the need for artifact-free training data to denoise electroencephalogram (EEG) signals from wearable devices. The technique achieves state-of-the-art performance even in extremely noisy conditions by learning to partition noisy EEG segments into independent realizations sharing the same underlying neural signal.

AIBullisharXiv – CS AI · May 117/10

🧠

Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness

Researchers introduce Pan-FM, a foundation model trained on multimodal medical imaging from seven organs that addresses the critical problem of missing data in real-world biomedical datasets. The model uses Saliency-Guided Masking to prevent bias toward dominant organs and demonstrates superior performance on disease prediction tasks across the UK Biobank.

AIBullisharXiv – CS AI · Apr 157/10

🧠

Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

Researchers propose a label-free self-supervised reinforcement learning framework that enables language models to follow complex multi-constraint instructions without external supervision. The approach derives reward signals directly from instructions and uses constraint decomposition strategies to address sparse reward challenges, demonstrating strong performance across both in-domain and out-of-domain instruction-following tasks.

AIBullisharXiv – CS AI · Apr 147/10

🧠

TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

TimeRewarder is a new machine learning method that learns dense reward signals from passive videos to improve reinforcement learning in robotics. By modeling temporal distances between video frames, the approach achieves 90% success rates on Meta-World tasks using significantly fewer environment interactions than prior methods, while also leveraging human videos for scalable reward learning.

AIBullisharXiv – CS AI · Jun 256/10

🧠

A welding penetration prediction model for laser welding process based on self-supervised learning using physics-informed neural networks

Researchers introduce SimPhysNet, a self-supervised learning algorithm that predicts laser welding penetration with 96.06% accuracy using only 200 labeled images—roughly 5% of typical datasets. The physics-informed neural network approach combines contrastive learning with few-shot learning to overcome the industrial manufacturing challenge of requiring extensive labeled data for quality assurance.

AINeutralarXiv – CS AI · Jun 255/10

🧠

HiT-JEPA: A Hierarchical Self-supervised Trajectory Embedding Framework for Similarity Computation

Researchers introduce HiT-JEPA, a hierarchical self-supervised learning framework that represents urban trajectory data across multiple semantic levels to improve similarity computation. The model captures fine-grained movement details, intermediate patterns, and high-level abstractions simultaneously, addressing limitations in existing approaches that struggle to balance local nuances with global dependencies.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Rethinking Object-Centric Representations for Video Dynamics Modeling

Researchers introduce STAITUS, a machine learning framework that improves unsupervised video object tracking by explicitly separating appearance features from geometric pose information in slot-based representations. The approach addresses a fundamental problem where enforcing temporal consistency causes models to mistrack moving objects and fragment identities, achieving superior performance on tracking stability and segmentation quality.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Brain-Inspired Stochastic Joint Embedding Representation Learning

Researchers introduce PhiNet v2, a brain-inspired machine learning architecture that learns visual representations from temporal image sequences without heavy data augmentation, achieving competitive performance with state-of-the-art models while mimicking biological visual processing more closely.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Temporal Graph Pattern Machine

Researchers introduce Temporal Graph Pattern Machine (TGPM), a foundation framework that learns generalized evolving patterns in dynamic networks using Transformer architecture and self-supervised pre-training. The model achieves top performance on temporal link prediction and node classification tasks while demonstrating strong cross-domain transferability, addressing limitations of existing task-centric approaches.

AINeutralarXiv – CS AI · Jun 236/10

🧠

SOHET: Sequence Of Heterogeneous Events Transformer with Self-Supervised Pre-Training

Researchers introduce SOHET, a transformer-based architecture for processing heterogeneous event streams with self-supervised pre-training capabilities. The model demonstrates significant performance improvements on fraud detection and sequential prediction tasks, outperforming existing methods by 5.8% on a large-scale benchmark while achieving faster convergence.

AINeutralarXiv – CS AI · Jun 235/10

🧠

How Well Do Self-Supervised Speech Models Encode Age and Gender in Children's Speech? A Layer-Wise Analysis Across Multiple Architectures

Researchers conducted a comprehensive layer-wise analysis of how four major self-supervised learning (SSL) speech models encode age and gender information in children's speech. The study reveals that age and gender cues are unevenly distributed across model layers, with early-to-mid layers capturing the strongest paralinguistic signals, and demonstrates reliable classification accuracy even from 1-3 second audio segments.

Page 1 of 4Next →