y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-internals News & Analysis

2 articles tagged with #model-internals. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBearisharXiv โ€“ CS AI ยท 14h ago7/10
๐Ÿง 

What do your logits know? (The answer may surprise you!)

Researchers demonstrate that AI model logits and other accessible model outputs leak significant task-irrelevant information from vision-language models, creating potential security risks through unintentional or malicious information exposure despite apparent safeguards.

AINeutralarXiv โ€“ CS AI ยท 14h ago6/10
๐Ÿง 

Relational Preference Encoding in Looped Transformer Internal States

Researchers demonstrate that looped transformers like Ouro-2.6B encode human preferences relationally rather than independently, with pairwise evaluators achieving 95.2% accuracy compared to 21.75% for independent classification. The study reveals that preference encoding is fundamentally relational, functioning as an internal consistency probe rather than a direct predictor of human annotations.

๐Ÿข Anthropic