y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#internal-representations News & Analysis

2 articles tagged with #internal-representations. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AINeutralarXiv – CS AI · 2d ago7/10
🧠

MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models

Researchers introduce MENTIS, a framework for measuring internal geometric changes in language models during preference alignment training. The study reveals that alignment leaves selective, depth-localized signatures in model computations, with normative concepts showing larger internal reorganization than factual concepts across multiple model architectures.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations

Researchers propose a conformal prediction framework for large language models that uses internal neural representations rather than surface-level outputs to assess reliability and uncertainty. The Layer-Wise Information scoring method improves prediction validity under distribution shift while maintaining competitive performance, addressing a critical challenge in deploying LLMs where traditional uncertainty signals become unreliable.