#ai-safety News & Analysis

649 articles tagged with #ai-safety. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

649 articles

AINeutralOpenAI News · Jul 176/106

🧠

Agent bio bug bounty call

OpenAI has launched a Bio Bug Bounty program inviting researchers to test ChatGPT agent's safety mechanisms using universal jailbreak prompts. The program offers rewards up to $25,000 for identifying vulnerabilities in the AI system's safety protocols.

AINeutralOpenAI News · Jun 55/105

🧠

Disrupting malicious uses of AI: June 2025

An organization released its June 2025 update detailing efforts to combat malicious AI uses through safety detection tools and responsible deployment practices. The initiative focuses on supporting democratic values and countering AI abuse for societal benefit.

AINeutralGoogle DeepMind Blog · Apr 26/105

🧠

Evaluating potential cybersecurity threats of advanced AI

A new framework has been developed to help cybersecurity experts evaluate and prioritize defenses against potential threats from advanced AI systems. The framework aims to enable organizations to systematically identify necessary security measures and allocate resources effectively.

AINeutralOpenAI News · Mar 266/107

🧠

Security on the path to AGI

OpenAI is implementing comprehensive security measures directly into their infrastructure and models as they progress toward artificial general intelligence (AGI). The company emphasizes proactive adaptation to address security challenges on the path to AGI development.

AINeutralOpenAI News · Feb 255/106

🧠

Deep research System Card

This report details safety measures implemented before releasing a deep research system, including external red teaming exercises and frontier risk evaluations. The work follows a structured Preparedness Framework and includes built-in mitigations to address identified key risk areas.

AINeutralOpenAI News · Feb 216/102

🧠

Disrupting malicious uses of AI

The article discusses efforts to ensure AI serves humanity's benefit by promoting democratic AI development, preventing malicious use cases, and defending against authoritarian threats. The focus is on establishing safeguards and governance frameworks to prevent AI misuse while maintaining beneficial applications.

AINeutralOpenAI News · Jan 316/104

🧠

OpenAI o3-mini System Card

OpenAI has released a system card detailing the safety work conducted for its new o3-mini model. The report covers safety evaluations, external red teaming exercises, and assessments under OpenAI's Preparedness Framework to ensure responsible deployment.

AINeutralOpenAI News · Jan 236/107

🧠

Operator System Card

This document outlines a multi-layered AI safety framework based on OpenAI's established approaches, focusing on protections against prompt engineering, jailbreaks, privacy and security concerns. It details model and product mitigations, external red teaming efforts, safety evaluations, and ongoing refinement of safeguards.

AIBullishGoogle DeepMind Blog · Dec 56/104

🧠

Google DeepMind at NeurIPS 2024

Google DeepMind presents research at NeurIPS 2024 focused on advancing adaptive AI agents, empowering 3D scene creation capabilities, and developing innovations in large language model training. The research aims to create smarter and safer AI systems for future applications.

AINeutralOpenAI News · Dec 55/105

🧠

OpenAI o1 System Card

OpenAI has released a system card detailing the safety evaluation process for their o1 and o1-mini models. The report covers external red teaming exercises and frontier risk assessments conducted under their Preparedness Framework before the models' public release.

AIBullishHugging Face Blog · Oct 226/103

🧠

Hugging Face Teams Up with Protect AI: Enhancing Model Security for the ML Community

Hugging Face has partnered with Protect AI to enhance security for machine learning models in their platform. This collaboration aims to provide better security tools and protections for the ML community using Hugging Face's model repository and services.

AINeutralOpenAI News · Oct 96/106

🧠

An update on disrupting deceptive uses of AI

OpenAI has published an update on their efforts to combat deceptive uses of AI technology. The company reaffirms its commitment to identifying, preventing, and disrupting attempts to abuse their AI models for harmful purposes as part of their mission to ensure AGI benefits humanity.

AIBullishOpenAI News · Sep 266/107

🧠

Upgrading the Moderation API with our new multimodal moderation model

OpenAI has launched a new multimodal moderation model based on GPT-4o that can more accurately detect harmful content in both text and images. This upgrade to the Moderation API will enable developers to build more effective content moderation systems across platforms.

AINeutralOpenAI News · Aug 86/103

🧠

GPT-4o System Card

OpenAI released a system card detailing the comprehensive safety work conducted before launching GPT-4o, including external red team testing and frontier risk evaluations. The report covers safety mitigations built into the model to address key risk areas according to their Preparedness Framework.

AIBullishHugging Face Blog · Jul 316/106

🧠

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Google has released Gemma 2 2B, a smaller 2-billion parameter version of its open-source AI model, alongside ShieldGemma for safety filtering and Gemma Scope for model interpretability. These releases expand Google's Gemma family with more accessible and transparent AI tools for developers and researchers.

AINeutralOpenAI News · Jun 135/105

🧠

OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors

OpenAI has appointed retired U.S. Army General Paul M. Nakasone to its Board of Directors, where he will serve on the Safety and Security Committee. Nakasone brings significant cybersecurity expertise to OpenAI's growing board as the company continues to expand its governance structure.

AINeutralOpenAI News · Jun 75/107

🧠

Expanding on how Voice Engine works and our safety research

OpenAI provides technical insights into Voice Engine, their text-to-speech model technology, along with details about their safety research approach. The article explores the underlying technology and safety considerations for their voice synthesis capabilities.

AINeutralOpenAI News · May 286/105

🧠

OpenAI Board Forms Safety and Security Committee

OpenAI has established a new Safety and Security Committee as part of its board structure. This move comes as the AI company continues to scale its operations and address growing concerns about AI safety and security governance.

AINeutralOpenAI News · May 216/104

🧠

OpenAI safety practices

OpenAI emphasizes the importance of responsible development and deployment of artificial general intelligence (AGI). The company highlights AGI's potential to benefit nearly every aspect of human life while stressing the critical need for safety practices.

AIBearishOpenAI News · Apr 196/105

🧠

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Large Language Models (LLMs) currently face significant security vulnerabilities from prompt injections and jailbreaks, where attackers can override the model's original instructions with malicious prompts. This highlights a critical weakness in current AI systems' ability to maintain instruction integrity and security.

AINeutralOpenAI News · Dec 146/104

🧠

Weak-to-strong generalization

Researchers present a new approach to AI alignment called weak-to-strong generalization, exploring whether deep learning's generalization properties can be used to control powerful AI models using weaker supervisory systems. The work addresses the superalignment problem of maintaining control over increasingly capable AI systems.

AINeutralOpenAI News · Oct 266/106

🧠

OpenAI’s Approach to Frontier Risk

OpenAI provides an update on their approach to managing frontier AI risks ahead of the UK AI Safety Summit. The article outlines their framework for identifying and mitigating potential risks from advanced AI systems.

AIBullishOpenAI News · Oct 196/105

🧠

DALL·E 3 is now available in ChatGPT Plus and Enterprise

OpenAI has made DALL·E 3 available to ChatGPT Plus and Enterprise users after developing safety mitigations for wider release. The company is also sharing updates on their provenance research efforts related to the AI image generation system.

AINeutralOpenAI News · Sep 256/105

🧠

GPT-4V(ision) system card

OpenAI has released the system card for GPT-4V(ision), documenting the safety evaluations and risk assessments for their multimodal AI model that can process both text and images. The system card outlines potential risks, limitations, and safety measures implemented before the model's deployment.

AINeutralOpenAI News · Sep 195/104

🧠

OpenAI Red Teaming Network

OpenAI has announced an open call for experts to join their Red Teaming Network, focusing on improving AI model safety. The initiative seeks domain experts to help identify vulnerabilities and enhance security measures for OpenAI's AI systems.

← PrevPage 24 of 26Next →