#representation-learning News & Analysis

162 articles tagged with #representation-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

162 articles

AINeutralarXiv – CS AI · Jun 116/10

🧠

When Context Returns: Toward Robust Internalization in On-Policy Distillation

Researchers identify a critical failure mode in on-policy distillation where reintroducing privileged context (like system prompts) to a distilled student model degrades performance, even on previously solved tasks. They propose a lightweight consistency regularizer using stop-gradient anchoring and forward KL divergence to achieve 'context removability,' enabling models to internalize context while remaining stable when it reappears.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Frozen Multimodal Embeddings for Personality and Cognitive Ability Assessment in Asynchronous Video Interviews

Researchers developed a multimodal machine learning approach using frozen pretrained encoders (CLIP, Whisper, RoBERTa) to predict personality traits and cognitive ability from asynchronous video interviews, achieving 19.1% improvement over baseline on personality assessment but revealing potential dataset shortcuts in cognitive ability evaluation.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Researchers investigate feature stability in sparse autoencoders (SAEs), finding that unstable features across training runs concentrate in reproducible lower-rank subspaces rather than representing pure noise. Stable features carry most functional signal for reconstruction and prediction, while unstable features have minimal individual impact but reflect shared geometric structure that different seeds resolve differently.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Implicit Neural Representations of Individual Behavior

Researchers introduce Behavioral INR, a self-supervised machine learning model that learns to identify and represent different behavioral policies from unlabeled multi-policy data by adapting implicit neural representations from computer vision. The approach shows promise in robotics, gaming, and racing datasets where mixed behaviors lack annotations, particularly excelling in continuous state-action environments with variable episode lengths.

AIBullisharXiv – CS AI · Jun 116/10

🧠

Making Foresight Actionable: Repurposing Representation Alignment in World Action Models

Researchers introduce AGRA, a new objective function that improves World Action Models (WAMs) for robot manipulation by aligning video diffusion features with semantic representations, solving the problem where visually plausible predictions don't translate to accurate control actions. The method enhances action decoder focus on task-relevant regions and improves robustness to task-irrelevant perturbations in both in-distribution and out-of-distribution scenarios.

AINeutralarXiv – CS AI · Jun 115/10

🧠

Latent World Recovery for Multimodal Learning with Missing Modalities

Researchers propose Latent World Recovery (LWR), a machine learning framework that handles multimodal datasets with missing data by aligning different data types in a shared latent space rather than imputing missing values. The approach shows promise for bioscience applications like cancer classification and survival prediction where heterogeneous data sources are often incomplete.

AINeutralarXiv – CS AI · Jun 116/10

🧠

OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

Researchers propose a novel unsupervised anomaly detection method that directly couples representation learning with One-Class SVM through a custom loss function, addressing limitations in existing reconstruction-based and decoupled approaches. The method demonstrates effectiveness on image corruption benchmarks and clinical brain MRI lesion detection, showing robustness to domain shifts without requiring labeled anomalous data.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Geometric Metrics and LLMs: What They Measure and When They Work

Researchers systematically tested geometric metrics for evaluating large language models, finding that several popular metrics like Schatten Norm and MOM primarily measure output length rather than quality. While geometric metrics add modest discriminative value beyond standard text statistics for tasks like generator identification, they show inconsistent correlation with actual text quality measures.

AINeutralarXiv – CS AI · Jun 116/10

🧠

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

Researchers demonstrate that task-aware layer pruning improves model performance on out-of-distribution (OOD) data while providing no benefits for in-distribution data. The improvement occurs because pruning removes layers that distort the task-adapted geometric representation, realigning OOD inputs with the model's learned task geometry.

AIBullisharXiv – CS AI · Jun 116/10

🧠

TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation

Researchers introduce TouchThinker, a tactile-language framework designed to advance embodied AI systems by scaling tactile commonsense reasoning. The work addresses key limitations through TouchThinker-1M, a million-scale dataset covering 415 objects and 7 sensor types, and proposes action-aware representation mechanisms to improve tactile signal efficiency and semantic expressiveness.

AINeutralarXiv – CS AI · Jun 116/10

🧠

PermDoRA -- Understanding Adapter Interference in Language Models: Limits of Parameter-Space Geometry

Researchers challenge the conventional wisdom that adapter interference in language models stems from parameter-space geometry by testing whether orthogonal or directionally independent updates reduce cross-domain interference. Their findings using DoRA-RBAC on multiple LLMs show geometry-aware merging provides no consistent advantage, suggesting interference mechanisms operate in shared nonlinear representations rather than linear parameter space.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Representation Curriculum: Stagewise Training for Robust Ranking and Allocation

Researchers propose Representation Curriculum (RC), a machine learning training method that improves ranking systems in digital marketplaces by strategically controlling when different data signals are introduced during model training. The approach reduces over-reliance on exposure-dependent historical signals and strengthens content-based merit evaluation, yielding better performance on cold-start scenarios and improved robustness across distribution shifts.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Tractogram foundation model

Researchers introduce TractFM, a foundation model that learns reusable representations from whole-brain diffusion MRI tractography data by combining local streamline encoding with permutation-equivariant processing. The model demonstrates strong transfer learning capabilities across different tractography algorithms, datasets, and prediction tasks, achieving accurate tract parcellation and demographic predictions without task-specific fine-tuning.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Trait-space Monitoring for Emergent Misalignment During Supervised Finetuning

Researchers have developed a method to detect emergent misalignment in large language models during finetuning by monitoring internal representational shifts rather than relying solely on behavioral evaluation. The technique identifies dangerous model behavior through a low-dimensional geometric signature in activation space, achieving high detection accuracy with minimal computational overhead.

AINeutralarXiv – CS AI · Jun 96/10

🧠

DOME: Learning Transferable Domain Variables from Sparse Supervision for Test-Time Adaptation

Researchers introduce DOME, a domain encoder that improves test-time adaptation by explicitly modeling sample-specific domain shifts rather than inferring a single global distribution. The method leverages vision-language pretraining and sparse domain banks to achieve state-of-the-art performance on multiple benchmarks, suggesting that structured domain representation outweighs algorithmic complexity.

AINeutralarXiv – CS AI · Jun 96/10

🧠

What Makes Video World Model Latents Action-Relevant: Prediction over Reconstruction

Researchers demonstrate that temporal video pretraining, not pixel reconstruction quality, drives action-relevant structure in video world model latent spaces. Across diverse encoder architectures, video-pretrained self-supervised models consistently outperform reconstruction-based approaches in recovering action information, with implications for developing more effective embodied AI systems.

AINeutralarXiv – CS AI · Jun 96/10

🧠

The Cross-Architecture Substrate: A Domain-Transcendent, Calibration-Surviving Geometric Invariant of Modern Vision Encoders

Researchers discovered that thirteen different vision neural networks, despite being trained for distinct tasks (classification, contrast learning, image-text matching), converge on the same sixteen-dimensional geometric structure called the 'cross-architecture substrate.' This invariant structure persists across multiple visual domains and survives calibration testing, suggesting a universal representational principle in modern vision encoders that could enable new transfer learning and distillation techniques.

AIBullisharXiv – CS AI · Jun 96/10

🧠

PAI: Preserving Amplitude Information in Representation-Based Time-Series Anomaly Detection

Researchers propose PAI, a novel anomaly scoring scheme that addresses a critical limitation in representation-based time-series anomaly detection by explicitly preserving amplitude information in learned embeddings. The method achieves significant performance improvements, with average gains of 98.4% on TSB-AD-U-Eva and 36.8% on TAB UV datasets, suggesting that amplitude retention is crucial for robust anomaly detection.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

Researchers analyzed whether pretrained video foundation models encode intuitive physics understanding by probing three model types (V-JEPA, VideoMAE, and LTX-Video) across frozen representations. Results show physics knowledge emerges reliably in intermediate-to-late layers, with V-JEPA performing strongest and temporal information proving critical for understanding physical dynamics.

AINeutralarXiv – CS AI · Jun 96/10

🧠

A Geometric Theory of Cognition for Machine Intelligence

Researchers propose a geometric framework for machine intelligence where cognitive computation emerges from Riemannian gradient flow on learned latent manifolds, eliminating the need for explicit memory modules. The approach demonstrates superior robustness across reinforcement learning tasks involving partial observability, sensory disruptions, and long-horizon prediction compared to feedforward baselines.

AIBullisharXiv – CS AI · Jun 96/10

🧠

OSMGraphCLIP: Learning Global Location Representations from OpenStreetMap Graphs

OSMGraphCLIP is a new geospatial AI model that learns location representations from OpenStreetMap data rather than satellite imagery. The model matches or outperforms satellite-based systems on diverse tasks including climate prediction, socioeconomic analysis, and wildfire forecasting, demonstrating that map topology and semantic data alone can capture meaningful geographic patterns.

AINeutralarXiv – CS AI · Jun 96/10

🧠

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

TRL-Bench introduces a standardized benchmark for evaluating tabular data encoders across different training paradigms, releasing curated datasets and demonstrating that encoder quality is task-dependent rather than universally superior. The framework enables fair comparison of 20 models across representation-level tasks, revealing that no single encoder dominates across all scenarios.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Geometric Second-Order Feature Correlation Learning for Self-Supervised Speech Emotion Recognition

Researchers propose a Second-Order Correlation (SOC) layer that improves speech emotion recognition by modeling feature correlations as covariance descriptors rather than treating features independently. Using Log-Euclidean mapping to preserve geometric properties, the method demonstrates superior performance on standard emotion recognition datasets compared to conventional first-order aggregation approaches.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Breaking the Lock-in: Diversifying Text-to-Image Generation via Representation Modulation

Researchers present DAVE, a training-free method that enhances diversity in text-to-image generation by attenuating the DC (zero-frequency) component of intermediate Transformer features during early generation stages. The technique addresses the problem of identical outputs from the same prompt without requiring expensive sampling overhead or auxiliary optimization.

AINeutralarXiv – CS AI · Jun 86/10

🧠

SV-Detect: AI-generated Text Detection with Steering Vectors

Researchers have developed SV-Detect, an AI detection system using steering vectors extracted from language model hidden layers to distinguish human-written from machine-generated text. The method demonstrates robust performance across domain shifts, different source models, and edited content, positioning fake-text detection as a representation-space probing problem rather than surface-level analysis.

← PrevPage 3 of 7Next →