y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#toxicity-detection News & Analysis

2 articles tagged with #toxicity-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

Researchers introduce DCR (Discernment via Contrastive Refinement), a new method to reduce over-refusal in safety-aligned large language models. The approach helps LLMs better distinguish between genuinely toxic and seemingly toxic prompts, maintaining safety while improving helpfulness without degrading general capabilities.

AINeutralarXiv โ€“ CS AI ยท Mar 55/10
๐Ÿง 

M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity

Researchers developed M-QUEST, a new benchmark for evaluating AI models' ability to understand and detect toxicity in internet memes. The framework identifies 10 key dimensions for meme interpretation and tests 8 open-source language models, finding that instruction-tuned models perform better but still struggle with pragmatic inference.