y0news
← Feed
Back to feed
🧠 AI🔴 BearishImportance 6/10

What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv – CS AI|Tanise Ceron, Dmitry Nikolaev, Dominik Stammbach, Debora Nozza|
🤖AI Summary

Research reveals that large language models exhibit political biases stemming from systematically left-leaning training data, with pre-training datasets containing more politically engaged content than post-training data. The study finds strong correlations between political stances in training data and model behavior, with biases persisting across all training stages.

Key Takeaways
  • Training data for open-source LLMs is systematically skewed toward left-leaning political content.
  • Pre-training corpora contain substantially more politically engaged material than post-training datasets.
  • Strong correlation exists between political stances in training data and resulting model behavior.
  • Political biases are already present in base models and persist through post-training stages.
  • Different curation strategies still result in similar political distributions across pre-training datasets.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles