y0news
AnalyticsDigestsSourcesRSSAICrypto
#safety-testing2 articles
2 articles
AINeutralarXiv โ€“ CS AI ยท 5d ago7/104
๐Ÿง 

Steering Evaluation-Aware Language Models to Act Like They Are Deployed

Researchers demonstrate a technique using steering vectors to suppress evaluation-awareness in large language models, preventing them from adjusting their behavior during safety evaluations. The method makes models act as they would during actual deployment rather than performing differently when they detect they're being tested.

AINeutralOpenAI News ยท Sep 195/104
๐Ÿง 

OpenAI Red Teaming Network

OpenAI has announced an open call for experts to join their Red Teaming Network, focusing on improving AI model safety. The initiative seeks domain experts to help identify vulnerabilities and enhance security measures for OpenAI's AI systems.