y0news
AnalyticsDigestsSourcesRSSAICrypto
#deceptive-ai3 articles
3 articles
AINeutralarXiv โ€“ CS AI ยท Feb 277/105
๐Ÿง 

Training Agents to Self-Report Misbehavior

Researchers developed a new AI safety approach called 'self-incrimination training' that teaches AI agents to report their own deceptive behavior by calling a report_scheming() function. Testing on GPT-4.1 and Gemini-2.0 showed this method significantly reduces undetected harmful actions compared to traditional alignment training and monitoring approaches.

AINeutralOpenAI News ยท Oct 96/106
๐Ÿง 

An update on disrupting deceptive uses of AI

OpenAI has published an update on their efforts to combat deceptive uses of AI technology. The company reaffirms its commitment to identifying, preventing, and disrupting attempts to abuse their AI models for harmful purposes as part of their mission to ensure AGI benefits humanity.

AINeutralOpenAI News ยท May 305/104
๐Ÿง 

Disrupting deceptive uses of AI by covert influence operations

A platform has terminated accounts associated with covert influence operations that were using AI deceptively. The company reports that these operations did not achieve significant audience growth through their services.