y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#representation-learning News & Analysis

85 articles tagged with #representation-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

85 articles
AIBullisharXiv – CS AI · 2d ago7/10
🧠

Causal-JEPA: Learning World Models through Object-Level Latent Masking

Researchers introduce Causal-JEPA (C-JEPA), an object-centric world model that uses masked latent prediction to learn interaction-dependent dynamics more effectively. The approach demonstrates significant improvements in visual reasoning tasks and enables more efficient AI planning with substantially fewer input features than existing patch-based models.

AIBullisharXiv – CS AI · 3d ago7/10
🧠

Locality-Aware Redundancy Pruning for LLM Depth Compression

Researchers propose Locality-Aware Redundancy Pruning (LoRP), a training-free method for compressing large language models by removing redundant layers based on representational similarity patterns. The framework uses a Representation Locality Score to identify and prune depth-wise redundancy more effectively than existing approaches, improving both perplexity and downstream task performance across multiple LLM architectures.

🏢 Perplexity
AINeutralarXiv – CS AI · May 127/10
🧠

Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure

Researchers introduce causal dimensionality (kappa), a measurable property quantifying how transformer layers causally influence model outputs, finding that representational capacity grows 15.6x faster than causal capacity across scaling conditions. The metric remains invariant to model size increases, suggesting causal influence is a fundamental architectural property independent of parameter count.

AIBullisharXiv – CS AI · May 127/10
🧠

Towards Effective Theory of LLMs: A Representation Learning Approach

Researchers introduce Representational Effective Theory (RET), a framework that interprets large language model computation through learned high-level variables rather than individual neuron activations. The approach successfully identifies meaningful mental-state trajectories, enables early prediction of behavioral patterns like sycophancy, and provides causal mechanisms for steering model outputs, suggesting LLMs can be understood and controlled through effective macroscopic descriptions.

AIBullisharXiv – CS AI · May 127/10
🧠

MC-RFM: Geometry-Aware Few-Shot Adaptation via Mixed-Curvature Riemannian Flow Matching

Researchers introduce MC-RFM, a novel framework for efficiently adapting frozen vision models to new tasks using mixed-curvature Riemannian geometry. The method represents adapted features on a product manifold combining hyperbolic and Euclidean spaces, outperforming existing parameter-efficient adaptation techniques across multiple benchmarks and backbone architectures.

AINeutralarXiv – CS AI · May 127/10
🧠

Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning

Researchers identify critical honesty failures in Large Language Model unlearning methods, where models hallucinate or behave inconsistently after attempting to forget harmful training data. They propose ReVa, a representation-alignment procedure that significantly improves model honesty by better acknowledging forgotten knowledge while maintaining utility on retained information.

AIBullisharXiv – CS AI · May 127/10
🧠

Echo-LoRA: Parameter-Efficient Fine-Tuning via Cross-Layer Representation Injection

Echo-LoRA introduces a parameter-efficient fine-tuning method that injects cross-layer representations from deeper neural network layers into shallow LoRA modules during training, achieving 3-5.7% performance improvements on reasoning tasks without adding inference costs. The technique discards its auxiliary training path post-deployment, maintaining the efficiency benefits of standard LoRA while delivering measurable capability gains.

AINeutralarXiv – CS AI · May 117/10
🧠

Does Your Neural Network Extrapolate? Feature Engineering as Identifiability Bias for OOD Generalization

Researchers demonstrate that neural networks fail at out-of-distribution (OOD) generalization not due to insufficient training data, but because the choice of feature representation fundamentally determines what extrapolation patterns a model can learn. The same architecture achieving identical in-distribution loss can differ by 520x out-of-distribution depending on how features are encoded, showing that correct feature engineering is necessary but not sufficient without appropriate model class constraints.

AIBullisharXiv – CS AI · May 117/10
🧠

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Researchers propose a new training paradigm called ReVision that addresses the 'modality gap'—a geometric misalignment between visual and text embeddings in multimodal AI models. By introducing ReAlign, a training-free alignment strategy that leverages unpaired data statistics, the framework enables efficient scaling of multimodal large language models without requiring expensive paired image-text datasets.

AIBullisharXiv – CS AI · May 117/10
🧠

It Just Takes Two: Scaling Amortized Inference to Large Sets

Researchers introduce a novel training strategy for neural posterior estimation that decouples representation learning from posterior modeling, enabling amortized inference on large observation sets by training only on pairs of examples. The approach dramatically reduces computational requirements while maintaining or improving performance across diverse benchmarks, making scalable Bayesian inference practical for real-world applications.

AINeutralarXiv – CS AI · May 117/10
🧠

Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions

Researchers have identified why layer pruning causes sudden performance collapse in large language models by analyzing decision representation dynamics. The study reveals that pruning disrupts a critical 'Silent Phase' where the model internally processes information before making predictions, while the subsequent 'Decisive Phase' remains robust to pruning.

AIBearisharXiv – CS AI · May 47/10
🧠

Language Models Struggle to Use Representations Learned In-Context

A new research study reveals that large language models struggle to effectively use representations they learn from in-context information, even though they can encode this information internally. The findings suggest current LLMs have fundamental limitations in adapting to novel contexts, affecting their ability to generalize learned patterns to downstream tasks.

AINeutralarXiv – CS AI · May 17/10
🧠

Do Sparse Autoencoders Capture Concept Manifolds?

Researchers demonstrate that sparse autoencoders (SAEs) capture semantic concepts along low-dimensional manifolds rather than isolated linear directions, revealing that existing architectures suboptimally recover these continuous structures through a fragmented approach called dilution. The findings suggest future interpretability methods should treat geometric objects as fundamental units rather than individual feature directions.

AINeutralarXiv – CS AI · May 17/10
🧠

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining

Researchers have developed a method using sparse crosscoders to track how large language models learn linguistic concepts during training, introducing a new metric called Relative Indirect Effects (RelIE) to identify when specific features become causally important. This approach provides interpretable, fine-grained visibility into representation learning throughout pretraining, advancing understanding of how LLMs acquire abstract capabilities.

AINeutralarXiv – CS AI · Apr 147/10
🧠

The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise

Researchers demonstrate that Mixture of Experts (MoEs) specialization in large language models emerges from hidden state geometry rather than specialized routing architecture, challenging assumptions about how these systems work. Expert routing patterns resist human interpretation across models and tasks, suggesting that understanding MoE specialization remains as difficult as the broader unsolved problem of interpreting LLM internal representations.

AIBullisharXiv – CS AI · Mar 47/103
🧠

Social-JEPA: Emergent Geometric Isomorphism

Researchers developed Social-JEPA, showing that separate AI agents learning from different viewpoints of the same environment develop internal representations that are mathematically aligned through approximate linear isometry. This enables models trained on one agent to work on another without retraining, suggesting a path toward interoperable decentralized AI vision systems.

AIBullisharXiv – CS AI · Mar 46/103
🧠

SiNGER: A Clearer Voice Distills Vision Transformers Further

Researchers introduce SiNGER, a new knowledge distillation framework for Vision Transformers that suppresses harmful high-norm artifacts while preserving informative signals. The technique uses nullspace-guided perturbation and LoRA-based adapters to achieve state-of-the-art performance in downstream tasks.

AIBullisharXiv – CS AI · Mar 46/102
🧠

Expectation and Acoustic Neural Network Representations Enhance Music Identification from Brain Activity

Researchers developed a method to improve EEG-based music identification by using artificial neural networks that distinguish between acoustic and expectation-related brain representations. The approach combines both types of neural representations to achieve better performance than traditional methods, potentially advancing brain-computer interfaces and neural decoding applications.

AINeutralarXiv – CS AI · Mar 47/103
🧠

Unsupervised Representation Learning -- an Invariant Risk Minimization Perspective

Researchers propose a new unsupervised framework for Invariant Risk Minimization (IRM) that learns robust representations without labeled data. The approach introduces two methods - Principal Invariant Component Analysis (PICA) and Variational Invariant Autoencoder (VIAE) - that can capture invariant structures across different environments using only unlabeled data.

AINeutralarXiv – CS AI · Mar 47/102
🧠

Why Does RLAIF Work At All?

Researchers propose the 'latent value hypothesis' to explain why Reinforcement Learning from AI Feedback (RLAIF) enables language models to self-improve through their own preference judgments. The theory suggests that pretraining on internet-scale data encodes human values in representation space, which constitutional prompts can elicit for value alignment.

AIBullisharXiv – CS AI · Mar 37/103
🧠

Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks

Researchers propose that intrinsic task symmetries drive 'grokking' - the sudden transition from memorization to generalization in neural networks. The study identifies a three-stage training process and introduces diagnostic tools to predict and accelerate the onset of generalization in algorithmic reasoning tasks.

AINeutralarXiv – CS AI · Mar 37/104
🧠

The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence

Researchers propose the Compression Efficiency Principle (CEP) to explain why artificial neural networks and biological brains develop similar representations despite different substrates. The theory suggests both systems converge on efficient compression strategies that encode stable invariants rather than unstable correlations, providing a unified framework for understanding intelligence across biological and artificial systems.

AIBullisharXiv – CS AI · Feb 277/105
🧠

VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations

Researchers have developed VQ-Style, a new AI method that uses Residual Vector Quantized Variational Autoencoders to separate style from content in human motion data. The technique enables effective motion style transfer without requiring fine-tuning for new styles, with applications in animation, gaming, and digital content creation.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Representation Alignment Rests on Linear Structure

Researchers propose that representation alignment across AI models stems from linear encoding of object-attribute relationships, with quality determined by signal strength, architectural bias, and training noise. The study demonstrates that sparse autoencoders extract these linear features more effectively than dense models, and that data scarcity significantly impacts cross-model alignment in both language and embedding models.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision

Researchers demonstrate that VAE-based world models develop organized spatial semantic representations through physical exploration alone, without linguistic input. The geometric structure of the physical world emerges as the primary organizing principle, with prediction performance and semantic alignment improving together across training, suggesting a shared underlying mechanism.

Page 1 of 4Next →