#representation-learning News & Analysis

162 articles tagged with #representation-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

162 articles

AIBullisharXiv – CS AI · May 117/10

🧠

It Just Takes Two: Scaling Amortized Inference to Large Sets

Researchers introduce a novel training strategy for neural posterior estimation that decouples representation learning from posterior modeling, enabling amortized inference on large observation sets by training only on pairs of examples. The approach dramatically reduces computational requirements while maintaining or improving performance across diverse benchmarks, making scalable Bayesian inference practical for real-world applications.

AIBearisharXiv – CS AI · May 47/10

🧠

Language Models Struggle to Use Representations Learned In-Context

A new research study reveals that large language models struggle to effectively use representations they learn from in-context information, even though they can encode this information internally. The findings suggest current LLMs have fundamental limitations in adapting to novel contexts, affecting their ability to generalize learned patterns to downstream tasks.

AINeutralarXiv – CS AI · May 17/10

🧠

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining

Researchers have developed a method using sparse crosscoders to track how large language models learn linguistic concepts during training, introducing a new metric called Relative Indirect Effects (RelIE) to identify when specific features become causally important. This approach provides interpretable, fine-grained visibility into representation learning throughout pretraining, advancing understanding of how LLMs acquire abstract capabilities.

AINeutralarXiv – CS AI · May 17/10

🧠

Do Sparse Autoencoders Capture Concept Manifolds?

Researchers demonstrate that sparse autoencoders (SAEs) capture semantic concepts along low-dimensional manifolds rather than isolated linear directions, revealing that existing architectures suboptimally recover these continuous structures through a fragmented approach called dilution. The findings suggest future interpretability methods should treat geometric objects as fundamental units rather than individual feature directions.

AINeutralarXiv – CS AI · Apr 147/10

🧠

The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise

Researchers demonstrate that Mixture of Experts (MoEs) specialization in large language models emerges from hidden state geometry rather than specialized routing architecture, challenging assumptions about how these systems work. Expert routing patterns resist human interpretation across models and tasks, suggesting that understanding MoE specialization remains as difficult as the broader unsolved problem of interpreting LLM internal representations.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Social-JEPA: Emergent Geometric Isomorphism

Researchers developed Social-JEPA, showing that separate AI agents learning from different viewpoints of the same environment develop internal representations that are mathematically aligned through approximate linear isometry. This enables models trained on one agent to work on another without retraining, suggesting a path toward interoperable decentralized AI vision systems.

AINeutralarXiv – CS AI · Mar 47/102

🧠

Why Does RLAIF Work At All?

Researchers propose the 'latent value hypothesis' to explain why Reinforcement Learning from AI Feedback (RLAIF) enables language models to self-improve through their own preference judgments. The theory suggests that pretraining on internet-scale data encodes human values in representation space, which constitutional prompts can elicit for value alignment.

AINeutralarXiv – CS AI · Mar 47/103

🧠

Unsupervised Representation Learning -- an Invariant Risk Minimization Perspective

Researchers propose a new unsupervised framework for Invariant Risk Minimization (IRM) that learns robust representations without labeled data. The approach introduces two methods - Principal Invariant Component Analysis (PICA) and Variational Invariant Autoencoder (VIAE) - that can capture invariant structures across different environments using only unlabeled data.

AIBullisharXiv – CS AI · Mar 46/102

🧠

Expectation and Acoustic Neural Network Representations Enhance Music Identification from Brain Activity

Researchers developed a method to improve EEG-based music identification by using artificial neural networks that distinguish between acoustic and expectation-related brain representations. The approach combines both types of neural representations to achieve better performance than traditional methods, potentially advancing brain-computer interfaces and neural decoding applications.

AIBullisharXiv – CS AI · Mar 46/103

🧠

SiNGER: A Clearer Voice Distills Vision Transformers Further

Researchers introduce SiNGER, a new knowledge distillation framework for Vision Transformers that suppresses harmful high-norm artifacts while preserving informative signals. The technique uses nullspace-guided perturbation and LoRA-based adapters to achieve state-of-the-art performance in downstream tasks.

AINeutralarXiv – CS AI · Mar 37/104

🧠

The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence

Researchers propose the Compression Efficiency Principle (CEP) to explain why artificial neural networks and biological brains develop similar representations despite different substrates. The theory suggests both systems converge on efficient compression strategies that encode stable invariants rather than unstable correlations, providing a unified framework for understanding intelligence across biological and artificial systems.

AIBullisharXiv – CS AI · Mar 37/103

🧠

Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks

Researchers propose that intrinsic task symmetries drive 'grokking' - the sudden transition from memorization to generalization in neural networks. The study identifies a three-stage training process and introduces diagnostic tools to predict and accelerate the onset of generalization in algorithmic reasoning tasks.

AIBullisharXiv – CS AI · Feb 277/105

🧠

VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations

Researchers have developed VQ-Style, a new AI method that uses Residual Vector Quantized Variational Autoencoders to separate style from content in human motion data. The technique enables effective motion style transfer without requiring fine-tuning for new styles, with applications in animation, gaming, and digital content creation.

AINeutralarXiv – CS AI · Jun 255/10

🧠

Elo-Disentangled Player-Style Embeddings for Human Chess via Rating-Conditioned Residual Move Model

Researchers developed a machine learning approach that separates chess playing strength (Elo rating) from individual player style by using a rating-conditioned base model combined with learned player embeddings. The method achieves 27-37% relative improvement in move prediction accuracy over existing models while successfully disentangling stylistic preferences from playing skill level.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Orthogonal Representation Editing: Decoupling Semantic Entanglement in Batch Knowledge Editing of LLMs

Researchers propose Orthogonal Representation Editing (ORE), a novel method for efficiently updating factual knowledge in Large Language Models without full retraining. The technique addresses a critical limitation in batch knowledge editing by decoupling semantic representation entanglement through orthogonal constraints, demonstrating superior performance including cross-lingual capabilities.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Brain-Inspired Stochastic Joint Embedding Representation Learning

Researchers introduce PhiNet v2, a brain-inspired machine learning architecture that learns visual representations from temporal image sequences without heavy data augmentation, achieving competitive performance with state-of-the-art models while mimicking biological visual processing more closely.

AINeutralarXiv – CS AI · Jun 235/10

🧠

The Impact of VAE Design on Latent Pose Representations for Diffusion-based Sign Language Production

Researchers investigate how variational autoencoder (VAE) design choices affect latent space properties in sign language production systems using diffusion models. Testing on the Phoenix14T dataset reveals that downstream generative performance correlates more strongly with latent space structure than with traditional reconstruction metrics, suggesting current evaluation methods may miss critical factors influencing model quality.

AINeutralarXiv – CS AI · Jun 236/10

🧠

HERMAN: Hierarchical Representation Matching for CLIP-based Class-Incremental Learning

HERMAN introduces a hierarchical representation matching framework for CLIP-based class-incremental learning, using LLM-generated textual descriptors to capture multi-level semantic relationships. The approach addresses limitations in existing vision-language models by leveraging hierarchical visual concepts rather than simplistic templates, demonstrating improved performance on multiple benchmarks.

AINeutralarXiv – CS AI · Jun 236/10

🧠

PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning

Researchers introduce PoLAR, a novel latent action representation framework that uses radial-direction structure in hyperbolic space to separately encode transition extent and mode for robot policy learning. The method improves downstream performance across simulation and real-world experiments by leveraging temporal gaps as a proxy for transition magnitude, outperforming existing latent action baselines and vision-language models.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Unsupervised Disentanglement Without Compromises : How Functional Orthogonality Enforces Identifiability

Researchers present a novel approach to unsupervised disentangled representation learning using functional orthogonality constraints on the Jacobian of generative models. The method proves identifiability of nonlinear generative models without requiring statistical independence or causal assumptions, challenging previous impossibility claims in the field.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Enhancing Protein Representation Learning via Manifold Restore Mixing

Researchers propose Manifold Restore Mixing (MRM), a novel data augmentation method that addresses structural degradation issues in protein representation learning by mixing hidden representations of original and augmented protein data. The approach combines manifold mixup techniques with a difficulty scheduler to generate training samples that preserve protein structure while introducing beneficial variations.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs

Researchers introduce SSProNet, a graph neural network that improves protein representation learning by incorporating secondary structure information and energy-filtered hydrogen-bond interactions. The approach demonstrates consistent improvements over existing graph-based methods while offering enhanced biological interpretability aligned with established structural motifs.

AINeutralarXiv – CS AI · Jun 196/10

🧠

The Hidden Evolution of Disguised Visual Context inside the VLM

Researchers conducted a controlled comparison of two architectural approaches for integrating visual information into large language models (LLMs), revealing that visual tokens undergo progressive transformation as they traverse network layers. The study demonstrates that integration paradigm choice fundamentally affects how visual features align with language space and model performance across vision-language tasks.

🏢 Meta

AINeutralarXiv – CS AI · Jun 196/10

🧠

Sensorimotor World Models: Perception for Action via Inverse Dynamics

Researchers introduce Sensorimotor World Models (SMWM), a latent world model that uses inverse dynamics regularization to learn action-aligned representations from high-dimensional observations. The approach addresses representation collapse in JEPA-style models while enabling efficient planning without frozen encoders or complex regularizers, demonstrating competitive performance on control tasks.

AINeutralarXiv – CS AI · Jun 116/10

🧠

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

Researchers introduce 'fragility' as a complementary metric to linear probing for analyzing large language model pre-training, addressing the limitation that probe accuracy saturates early in training and becomes insensitive to ongoing representational changes. By measuring activation noise tolerance levels, fragility reveals structural evolution in how models encode lexical versus compositional information across layers, demonstrating that data curation and architectural choices leave distinct signatures invisible to traditional accuracy metrics.

← PrevPage 2 of 7Next →