y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#harmful-content News & Analysis

2 articles tagged with #harmful-content. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBearisharXiv โ€“ CS AI ยท Mar 127/10
๐Ÿง 

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

Researchers have developed 'Amnesia,' a lightweight adversarial attack that bypasses safety mechanisms in open-weight Large Language Models by manipulating internal transformer states. The attack enables generation of harmful content without requiring fine-tuning or additional training, highlighting vulnerabilities in current LLM safety measures.

AIBullishOpenAI News ยท Sep 266/107
๐Ÿง 

Upgrading the Moderation API with our new multimodal moderation model

OpenAI has launched a new multimodal moderation model based on GPT-4o that can more accurately detect harmful content in both text and images. This upgrade to the Moderation API will enable developers to build more effective content moderation systems across platforms.