y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-truthfulness News & Analysis

1 article tagged with #model-truthfulness. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · 8h ago7/10
🧠

Lying Is Just a Phase: The Hidden Alignment Transition in Language Model Scaling

Researchers discover that language models exhibit a phase transition between reasoning and truthfulness capabilities at around 3.5B parameters, where smaller models show anticorrelated capabilities while larger ones show cooperation. This hidden alignment transition is invisible to standard loss curves but can be diagnosed from public benchmarks alone, and a proof-of-concept intervention demonstrates that adding a truth-direction vector can correct misaligned outputs without retraining.

🧠 Llama