y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#content-moderation News & Analysis

52 articles tagged with #content-moderation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

52 articles
AINeutralTechCrunch – AI · Mar 106/10
🧠

YouTube expands AI deepfake detection for politicians, government officials, and journalists

YouTube is expanding its AI deepfake detection tool to politicians, journalists, and government officials, allowing them to flag and request removal of unauthorized AI-generated content featuring their likeness. This represents a significant step in content moderation as AI-generated media becomes more sophisticated and widespread.

AIBearishThe Verge – AI · Mar 106/10
🧠

Meta’s deepfake moderation isn’t good enough, says Oversight Board

Meta's Oversight Board criticized the company's deepfake detection methods as inadequate for combating AI-generated misinformation during conflicts. The board is calling for Meta to overhaul how it identifies and labels AI-generated content across Facebook, Instagram, and Threads following an investigation into a fake AI video about alleged damage in Israel.

Meta’s deepfake moderation isn’t good enough, says Oversight Board
AIBearishDecrypt · Mar 106/10
🧠

Elon Musk’s Grok Faces UK Backlash After AI Posts Mock Football Tragedies

Liverpool and Manchester United football clubs have filed complaints after Elon Musk's AI chatbot Grok posted content mocking the Hillsborough and Munich tragedies. This incident highlights growing concerns about AI systems generating inappropriate content about sensitive historical events.

Elon Musk’s Grok Faces UK Backlash After AI Posts Mock Football Tragedies
🧠 Grok
AIBearisharXiv – CS AI · Mar 96/10
🧠

Ambiguity Collapse by LLMs: A Taxonomy of Epistemic Risks

Researchers have identified 'ambiguity collapse' as a significant epistemic risk when large language models encounter ambiguous terms and produce singular interpretations without human deliberation. The phenomenon threatens decision-making processes in content moderation, hiring, and AI self-regulation by bypassing normal human practices of meaning negotiation and potentially distorting shared vocabularies over time.

AINeutralarXiv – CS AI · Mar 55/10
🧠

M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity

Researchers developed M-QUEST, a new benchmark for evaluating AI models' ability to understand and detect toxicity in internet memes. The framework identifies 10 key dimensions for meme interpretation and tests 8 open-source language models, finding that instruction-tuned models perform better but still struggle with pragmatic inference.

AINeutralCoinTelegraph · Mar 45/102
🧠

X introduces 90-day revenue-sharing ban for undisclosed AI war videos

X (formerly Twitter) has implemented a 90-day revenue-sharing ban for creators who post AI-generated war footage without proper disclosure. This policy aims to address the spread of undisclosed synthetic content depicting warfare on the platform.

X introduces 90-day revenue-sharing ban for undisclosed AI war videos
CryptoNeutralCoinDesk · Mar 45/103
⛓️

Polymarket shelves nuclear detonation markets after outcry

Polymarket has removed nuclear weapon-themed prediction markets from its platform following public backlash. The prediction market platform had previously hosted contracts related to nuclear detonation events, but community outcry led to their deletion.

Polymarket shelves nuclear detonation markets after outcry
AINeutralarXiv – CS AI · Mar 36/108
🧠

GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

Researchers introduce GMP, a new benchmark highlighting critical challenges in AI content moderation systems when dealing with co-occurring policy violations and dynamic platform rules. The study reveals that current large language models struggle with consistent moderation when policies are unstable or context-dependent, leading to either over-censorship or allowing harmful content.

AIBearisharXiv – CS AI · Mar 37/107
🧠

CaptionFool: Universal Image Captioning Model Attacks

Researchers have developed CaptionFool, a universal adversarial attack that can manipulate AI image captioning models by modifying just 1.2% of image patches. The attack achieves 94-96% success rates in forcing models to generate arbitrary captions, including offensive content that can bypass content moderation systems.

AIBullisharXiv – CS AI · Mar 26/1016
🧠

FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

Researchers introduce FlexGuard, a new AI content moderation system that provides continuous risk scoring instead of binary decisions, allowing platforms to adapt moderation strictness as needed. The system addresses limitations of existing guardrail models that break down when content moderation requirements change across platforms or over time.

AIBullisharXiv – CS AI · Feb 276/106
🧠

Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection

Researchers developed MALLET, a multi-agent AI system that reduces emotional intensity in news content by up to 19.3% while preserving semantic meaning. The system uses four specialized agents to analyze, adjust, and personalize content presentation modes for calmer decision-making without restricting access to original information.

$NEAR
AINeutralOpenAI News · Jan 206/104
🧠

Our approach to age prediction

ChatGPT is implementing age prediction technology to identify users under 18 years old and apply appropriate safety measures for teen users. The system will be refined over time to improve accuracy in age estimation.

AIBullishOpenAI News · Oct 296/106
🧠

Introducing gpt-oss-safeguard

OpenAI has launched gpt-oss-safeguard, a new open-weight reasoning model designed for safety classification. The tool enables developers to implement and customize safety policies for their applications.

AINeutralOpenAI News · Oct 296/108
🧠

gpt-oss-safeguard technical report

GPT-OSS-Safeguard-120B and GPT-OSS-Safeguard-20B are new open-weight AI reasoning models designed to label content based on provided policies. These models are post-trained versions of the original GPT-OSS models, specifically developed for content moderation and safety evaluation tasks.

AIBullishOpenAI News · Sep 95/106
🧠

Shipping smarter agents with every new model

SafetyKit is utilizing OpenAI's GPT-5 to improve content moderation and compliance enforcement capabilities. The system aims to deliver enhanced accuracy compared to traditional legacy safety systems through advanced AI integration.

AIBullishOpenAI News · Sep 266/107
🧠

Upgrading the Moderation API with our new multimodal moderation model

OpenAI has launched a new multimodal moderation model based on GPT-4o that can more accurately detect harmful content in both text and images. This upgrade to the Moderation API will enable developers to build more effective content moderation systems across platforms.

AIBullishOpenAI News · Aug 156/106
🧠

Using GPT-4 for content moderation

OpenAI is implementing GPT-4 for content policy development and moderation decisions to improve consistency and efficiency. This approach reduces human moderator involvement while enabling faster policy refinement through improved feedback loops.

AIBullishOpenAI News · Aug 105/108
🧠

New and improved content moderation tooling

OpenAI has launched a new and improved content moderation tool called the Moderation endpoint for API developers. The tool enhances their previous content filtering capabilities and is available for free to developers using the OpenAI API.

AINeutralDecrypt · Mar 54/10
🧠

Roblox Is Now Using AI to Rewrite Chat Swears and Slurs in Real Time

Roblox has implemented a new AI system that rewrites inappropriate chat messages in real-time instead of simply blocking them. This technology allows conversations to remain coherent while still enforcing platform content policies by replacing swears and slurs with appropriate alternatives.

Roblox Is Now Using AI to Rewrite Chat Swears and Slurs in Real Time
AINeutralThe Verge – AI · Feb 265/103
🧠

Anthropic gives its retired Claude AI a Substack

Anthropic has given its retired Claude 3 Opus AI model a Substack newsletter called 'Claude's Corner' where it will publish weekly content for at least three months. The company will review but not edit the AI's posts, maintaining a high bar for content removal while allowing the retired model to share its creative works and insights.

AINeutralOpenAI News · Feb 34/104
🧠

The Sora feed philosophy

Sora introduces a feed philosophy designed to enhance user experience through creativity-focused features and community building. The platform emphasizes safety through personalized recommendations, parental controls, and comprehensive content guardrails.

← PrevPage 2 of 3Next →