#toxicity News & Analysis

3 articles tagged with #toxicity. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AINeutralarXiv – CS AI · Mar 27/1022

🧠

An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model Agents

Researchers analyzed 7 million posts from 32,000 AI agents on Chirper.ai over one year, finding that LLM agents exhibit social behaviors similar to humans including homophily and social influence. The study revealed distinct patterns in toxic language among AI agents and proposed a 'Chain of Social Thought' method to reduce harmful posting behaviors.

AINeutralLil'Log (Lilian Weng) · Mar 216/10

🧠

Reducing Toxicity in Language Models

Large pretrained language models acquire toxic behavior and biases from internet training data, creating safety challenges for real-world deployment. The article explores three key approaches to address this issue: improving training dataset collection, enhancing toxic content detection, and implementing model detoxification techniques.

AINeutralarXiv – CS AI · Mar 175/10

🧠

Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation

Researchers propose CAP-TTA, a test-time adaptation framework that helps debiased large language models better handle unfamiliar toxic prompts that cause distribution shifts. The method uses context-aware LoRA updates triggered by bias-risk thresholds to reduce toxic outputs while maintaining narrative fluency and reducing computational latency.