#ai-safety News & Analysis

649 articles tagged with #ai-safety. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

649 articles

AINeutralGoogle DeepMind Blog · Dec 166/105

🧠

Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Google has released Gemma Scope 2, providing open interpretability tools for understanding the behavior of language models across the entire Gemma 3 family. These tools are designed to help the AI safety community analyze and interpret complex language model behaviors.

AINeutralOpenAI News · Dec 106/105

🧠

Strengthening cyber resilience as AI capabilities advance

OpenAI is enhancing cybersecurity safeguards and defensive capabilities as AI models become more powerful. The company is focusing on risk assessment, preventing misuse, and collaborating with the security community to improve overall cyber resilience.

AINeutralImport AI (Jack Clark) · Dec 86/106

🧠

Import AI 437: Co-improving AI; RL dreams; AI labels might be annoying

Facebook researchers propose developing 'co-improving AI' systems rather than self-improving AI, suggesting a collaborative approach to AI advancement. The Import AI newsletter also covers reinforcement learning developments and discusses potential user annoyance with AI content labels.

AIBullishOpenAI News · Dec 36/105

🧠

How confessions can keep language models honest

OpenAI researchers are developing a 'confessions' method to train AI language models to acknowledge their mistakes and undesirable behavior. This approach aims to enhance AI honesty, transparency, and overall trustworthiness in model outputs.

AINeutralOpenAI News · Dec 15/104

🧠

Funding grants for new research into AI and mental health

OpenAI is providing up to $2 million in research grants focused on AI and mental health applications. The funding program aims to support studies examining real-world risks, benefits, and safety implications of AI in mental health contexts.

AINeutralOpenAI News · Nov 256/104

🧠

Our approach to mental health-related litigation

OpenAI is outlining its approach to handling mental health-related litigation cases involving ChatGPT. The company emphasizes handling sensitive cases with care, transparency, and respect while working to strengthen safety and support features in their AI platform.

AIBullishOpenAI News · Nov 196/108

🧠

Strengthening our safety ecosystem with external testing

OpenAI is collaborating with independent experts to conduct third-party testing of their frontier AI systems. This external evaluation approach aims to strengthen safety measures, validate existing safeguards, and improve transparency in assessing AI model capabilities and associated risks.

AIBullishOpenAI News · Nov 136/107

🧠

Understanding neural networks through sparse circuits

OpenAI is researching mechanistic interpretability through sparse neural network models to better understand AI reasoning processes. This approach aims to make AI systems more transparent and improve their safety and reliability.

AINeutralOpenAI News · Nov 126/103

🧠

GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum

OpenAI has released a system card addendum for GPT-5.1 Instant and GPT-5.1 Thinking models, providing updated safety metrics and evaluations. The addendum includes new assessments focused on mental health considerations and potential emotional reliance issues with the advanced AI systems.

AINeutralOpenAI News · Nov 66/107

🧠

Introducing the Teen Safety Blueprint

OpenAI has introduced the Teen Safety Blueprint, a comprehensive framework designed to guide responsible AI development with specific protections for young users. The blueprint emphasizes age-appropriate design principles, built-in safeguards, and collaborative approaches to ensure AI systems protect and empower teenagers in digital environments.

AINeutralOpenAI News · Nov 65/106

🧠

AI progress and recommendations

AI technology is advancing rapidly, presenting opportunities to guide its development toward beneficial outcomes. The focus is on steering AI progress toward scientific discovery, safety measures, and creating positive impacts for society.

AIBullishOpenAI News · Oct 296/106

🧠

Introducing gpt-oss-safeguard

OpenAI has launched gpt-oss-safeguard, a new open-weight reasoning model designed for safety classification. The tool enables developers to implement and customize safety policies for their applications.

AINeutralOpenAI News · Oct 296/108

🧠

gpt-oss-safeguard technical report

GPT-OSS-Safeguard-120B and GPT-OSS-Safeguard-20B are new open-weight AI reasoning models designed to label content based on provided policies. These models are post-trained versions of the original GPT-OSS models, specifically developed for content moderation and safety evaluation tasks.

AINeutralOpenAI News · Oct 276/107

🧠

Addendum to GPT-5 System Card: Sensitive conversations

OpenAI has released an addendum to GPT-5's system card detailing improvements in handling sensitive conversations. The update introduces new benchmarks for measuring emotional reliance, mental health interactions, and resistance to jailbreak attempts.

AIBullishOpenAI News · Oct 276/106

🧠

Strengthening ChatGPT’s responses in sensitive conversations

OpenAI partnered with over 170 mental health experts to enhance ChatGPT's ability to handle sensitive conversations, improving distress recognition and empathetic responses. The collaboration resulted in up to 80% reduction in unsafe responses and better guidance toward real-world mental health support.

AINeutralGoogle DeepMind Blog · Oct 236/107

🧠

Strengthening our Frontier Safety Framework

An organization is enhancing its Frontier Safety Framework (FSF) to better identify and mitigate severe risks associated with advanced AI models. This represents ongoing efforts to strengthen AI safety protocols as models become more sophisticated.

AIBullishOpenAI News · Oct 146/106

🧠

Expert Council on Well-Being and AI

OpenAI has established a new Expert Council on Well-Being and AI, comprising psychologists, clinicians, and researchers to guide ChatGPT's support for emotional health, particularly for teenagers. The council's expertise will inform the development of safer and more empathetic AI experiences focused on mental wellness.

AINeutralOpenAI News · Oct 96/107

🧠

Defining and evaluating political bias in LLMs

OpenAI has developed new real-world testing methods to evaluate and reduce political bias in ChatGPT. These methods focus on improving objectivity in AI responses and establishing better bias measurement frameworks.

AINeutralOpenAI News · Oct 75/102

🧠

Disrupting malicious uses of AI: October 2025

OpenAI released its October 2025 report detailing efforts to detect and disrupt malicious uses of AI technology. The report covers the company's policy enforcement mechanisms and measures to protect users from AI-related harms.

AIBullishOpenAI News · Sep 265/108

🧠

Partnering with AARP to help keep older adults safe online

OpenAI has partnered with AARP to enhance online safety for older adults through AI training programs, scam detection tools, and educational initiatives. The collaboration will leverage OpenAI Academy and OATS's Senior Planet program to deliver nationwide digital literacy and cybersecurity education.

AINeutralOpenAI News · Sep 165/105

🧠

Building towards age prediction

OpenAI is developing age prediction technology and parental controls for ChatGPT to provide safer, age-appropriate interactions for teenage users. These new safety features aim to support families by creating more controlled AI experiences for younger users.

AIBullishOpenAI News · Sep 95/106

🧠

Shipping smarter agents with every new model

SafetyKit is utilizing OpenAI's GPT-5 to improve content moderation and compliance enforcement capabilities. The system aims to deliver enhanced accuracy compared to traditional legacy safety systems through advanced AI integration.

AIBullishOpenAI News · Sep 26/105

🧠

Building more helpful ChatGPT experiences for everyone

OpenAI announces new safety and user experience improvements for ChatGPT, including expert partnerships, enhanced parental controls for teen users, and routing sensitive conversations to more advanced reasoning models. These changes aim to make ChatGPT more helpful and safer across different user groups.

AINeutralOpenAI News · Aug 276/108

🧠

Collective alignment: public input on our Model Spec

OpenAI conducted a survey of over 1,000 people globally to gather public input on AI behavior standards and compared these responses to their Model Spec guidelines. The initiative represents OpenAI's effort toward collective alignment, aiming to incorporate diverse human values and perspectives into AI system defaults.

AIBullishOpenAI News · Aug 45/108

🧠

What we’re optimizing ChatGPT for

OpenAI is enhancing ChatGPT with new features focused on user wellbeing, including improved support for difficult situations, break reminders, and better life advice capabilities. These improvements are being developed with guidance from expert input to help users thrive in various aspects of their lives.

← PrevPage 23 of 26Next →