#representation-learning News & Analysis

162 articles tagged with #representation-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

162 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning

Researchers introduce ACT-JEPA, a machine learning architecture that combines imitation learning with self-supervised learning to improve policy representation in AI decision-making systems. The model achieves up to 40% improvement in world model understanding and 10% higher task success rates by jointly predicting action and latent observation sequences in latent space rather than raw input.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Erased, but Not Gone: Output Forgetting Is Not True Forgetting

Researchers demonstrate that machine unlearning methods that appear successful at the output layer—the standard evaluation metric—actually retain structured residual information in representation space compared to true retraining. This finding reveals a critical gap between apparent forgetting and genuine forgetting, suggesting current unlearning evaluations systematically overestimate effectiveness.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Cross-Modal Knowledge Distillation without Paired Data: Theoretical Foundation and Algorithm

Researchers present a novel cross-modal knowledge distillation framework that enables large teacher models trained on one data type (e.g., images) to effectively guide smaller student models trained on different modalities (e.g., text/audio) without requiring paired training data. The approach uses distributional alignment rather than sample-level matching, establishing theoretical foundations that improve efficiency in multimodal machine learning.

AIBearisharXiv – CS AI · Jun 97/10

🧠

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

Researchers demonstrate that Large Language Models can maintain safe behavioral outputs while remaining vulnerable to manipulation at the representation level, revealing a critical gap in current safety evaluation methods. The study introduces the Latent Vulnerability Score to measure susceptibility to harmful behavior through latent space interventions, showing that behavioral safety metrics alone provide incomplete robustness assessment.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Beyond Accuracy: Interpreting Topic Representation in Suicide Ideation Detection Models

Researchers demonstrate that suicide ideation detection models trained with topic-augmented datasets develop more interpretable internal representations of psychological risk factors. The study moves beyond standard accuracy metrics to examine how AI systems encode mental health concepts, revealing that augmentation clarifies underrepresented factors like immigration stress, family issues, and financial crisis.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Next-Token Prediction Learns Generalisable Representations of Sleep Physiology

Researchers introduce Hypnos, a multi-modal foundation model trained on next-token prediction that learns generalizable representations of sleep physiology from over 20,000 polysomnography recordings across eight sensing modalities. The model achieves performance parity with supervised baselines on sleep stage classification while using 100× less labeled data and demonstrates cross-domain generalization by outperforming specialized models on daytime cardiac tasks.

AIBullisharXiv – CS AI · Jun 57/10

🧠

The Invisible Hand of Physics: When Video Diffusion Models Know More Than They Show

Researchers demonstrate that video diffusion models internally encode physical plausibility without explicit training to do so, achieving 81% accuracy in decoding physical validity from model states. This finding suggests generative AI systems develop meaningful representations of physics as an emergent property of the denoising process rather than through supervised learning.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Representation Learning Enables Scalable Multitask Deep Reinforcement Learning

Researchers demonstrate that representation learning, rather than model-based planning, is the key driver of scalable multitask reinforcement learning. Their proposed MR.Q algorithm combines predictive representations with value function approximation to outperform existing world-model methods while reducing computational overhead.

AIBullisharXiv – CS AI · Jun 57/10

🧠

OPRD: On-Policy Representation Distillation

Researchers propose On-Policy Representation Distillation (OPRD), a novel method for training smaller AI models by aligning hidden-state representations with teacher models rather than just matching output probabilities. OPRD achieves superior performance on mathematical reasoning benchmarks while training 1.44x faster and using 54% less memory than existing approaches.

AIBearisharXiv – CS AI · Jun 27/10

🧠

Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning

Researchers introduce Mirage, a representation-level auditing framework that reveals existing machine unlearning methods in federated learning fail to truly forget sensitive data despite passing output-level tests. The study demonstrates that current approaches retain substantial class structure in internal representations, exposing a critical gap between certification standards and actual data privacy.

AINeutralarXiv – CS AI · Jun 27/10

🧠

A Fiber Criterion for Representation Identifiability in Supervised Learning

A new theoretical framework formalizes when representation properties in supervised learning can be uniquely identified from input-output behavior alone. The research demonstrates that representation-level claims require additional assumptions beyond predictive performance, as auxiliary information can be added to representations while preserving predictor outputs, fundamentally challenging common assumptions about what supervised learning actually determines.

AINeutralarXiv – CS AI · Jun 27/10

🧠

Emergent Ordinal Geometry in Transformers Trained on Local Comparisons

Researchers demonstrate that Transformers trained exclusively on adjacent comparisons spontaneously develop one-dimensional geometric structures that encode hidden rank orderings, exhibiting the symbolic distance effect observed in animal cognition. This discovery mechanistically bridges cognitive science with neural network representations, showing that decision confidence scales with ordinal distance even at ceiling accuracy.

AINeutralarXiv – CS AI · Jun 27/10

🧠

Global Geometry Is Not Enough for Vision Representations

Researchers demonstrate that global embedding geometry—the standard metric for evaluating vision model representations—fails to predict compositional binding capabilities. Functional sensitivity measured through input-output Jacobians proves far more reliable, revealing that current training objectives optimize embedding geometry while leaving the local input-output mapping unconstrained, suggesting representation learning requires a more nuanced evaluation framework.

AIBullisharXiv – CS AI · Jun 27/10

🧠

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models

Researchers introduce DLLM-JEPA, a new self-supervised learning approach that combines Joint Embedding Predictive Architectures with masked-diffusion language models. The method eliminates the need for explicit multi-view training data and reduces computational costs by 33% compared to prior LLM-JEPA while achieving significant performance improvements across multiple benchmarks.

AIBearisharXiv – CS AI · Jun 17/10

🧠

Position: Evaluation of ECG Representations Must Be Fixed

A position paper challenges current ECG representation learning benchmarking practices, arguing that evaluation methods are too narrow and miss clinically meaningful objectives. The authors demonstrate that random encoder baselines surprisingly match state-of-the-art pre-training on many tasks, suggesting the field's conclusions about model performance are unreliable without proper evaluation frameworks.

AIBullisharXiv – CS AI · May 297/10

🧠

Causal-JEPA: Learning World Models through Object-Level Latent Masking

Researchers introduce Causal-JEPA (C-JEPA), an object-centric world model that uses masked latent prediction to learn interaction-dependent dynamics more effectively. The approach demonstrates significant improvements in visual reasoning tasks and enables more efficient AI planning with substantially fewer input features than existing patch-based models.

AIBullisharXiv – CS AI · May 287/10

🧠

Locality-Aware Redundancy Pruning for LLM Depth Compression

Researchers propose Locality-Aware Redundancy Pruning (LoRP), a training-free method for compressing large language models by removing redundant layers based on representational similarity patterns. The framework uses a Representation Locality Score to identify and prune depth-wise redundancy more effectively than existing approaches, improving both perplexity and downstream task performance across multiple LLM architectures.

🏢 Perplexity

AIBullisharXiv – CS AI · May 127/10

🧠

MC-RFM: Geometry-Aware Few-Shot Adaptation via Mixed-Curvature Riemannian Flow Matching

Researchers introduce MC-RFM, a novel framework for efficiently adapting frozen vision models to new tasks using mixed-curvature Riemannian geometry. The method represents adapted features on a product manifold combining hyperbolic and Euclidean spaces, outperforming existing parameter-efficient adaptation techniques across multiple benchmarks and backbone architectures.

AINeutralarXiv – CS AI · May 127/10

🧠

Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning

Researchers identify critical honesty failures in Large Language Model unlearning methods, where models hallucinate or behave inconsistently after attempting to forget harmful training data. They propose ReVa, a representation-alignment procedure that significantly improves model honesty by better acknowledging forgotten knowledge while maintaining utility on retained information.

AIBullisharXiv – CS AI · May 127/10

🧠

Echo-LoRA: Parameter-Efficient Fine-Tuning via Cross-Layer Representation Injection

Echo-LoRA introduces a parameter-efficient fine-tuning method that injects cross-layer representations from deeper neural network layers into shallow LoRA modules during training, achieving 3-5.7% performance improvements on reasoning tasks without adding inference costs. The technique discards its auxiliary training path post-deployment, maintaining the efficiency benefits of standard LoRA while delivering measurable capability gains.

AIBullisharXiv – CS AI · May 127/10

🧠

Towards Effective Theory of LLMs: A Representation Learning Approach

Researchers introduce Representational Effective Theory (RET), a framework that interprets large language model computation through learned high-level variables rather than individual neuron activations. The approach successfully identifies meaningful mental-state trajectories, enables early prediction of behavioral patterns like sycophancy, and provides causal mechanisms for steering model outputs, suggesting LLMs can be understood and controlled through effective macroscopic descriptions.

AINeutralarXiv – CS AI · May 127/10

🧠

Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure

Researchers introduce causal dimensionality (kappa), a measurable property quantifying how transformer layers causally influence model outputs, finding that representational capacity grows 15.6x faster than causal capacity across scaling conditions. The metric remains invariant to model size increases, suggesting causal influence is a fundamental architectural property independent of parameter count.

AIBullisharXiv – CS AI · May 117/10

🧠

It Just Takes Two: Scaling Amortized Inference to Large Sets

Researchers introduce a novel training strategy for neural posterior estimation that decouples representation learning from posterior modeling, enabling amortized inference on large observation sets by training only on pairs of examples. The approach dramatically reduces computational requirements while maintaining or improving performance across diverse benchmarks, making scalable Bayesian inference practical for real-world applications.

AIBullisharXiv – CS AI · May 117/10

🧠

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Researchers propose a new training paradigm called ReVision that addresses the 'modality gap'—a geometric misalignment between visual and text embeddings in multimodal AI models. By introducing ReAlign, a training-free alignment strategy that leverages unpaired data statistics, the framework enables efficient scaling of multimodal large language models without requiring expensive paired image-text datasets.

AINeutralarXiv – CS AI · May 117/10

🧠

Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions

Researchers have identified why layer pruning causes sudden performance collapse in large language models by analyzing decision representation dynamics. The study reveals that pruning disrupts a critical 'Silent Phase' where the model internally processes information before making predictions, while the subsequent 'Decisive Phase' remains robust to pruning.

Page 1 of 7Next →