649 articles tagged with #ai-safety. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralGoogle DeepMind Blog · Dec 166/105
🧠Google has released Gemma Scope 2, providing open interpretability tools for understanding the behavior of language models across the entire Gemma 3 family. These tools are designed to help the AI safety community analyze and interpret complex language model behaviors.
AINeutralOpenAI News · Dec 106/105
🧠OpenAI is enhancing cybersecurity safeguards and defensive capabilities as AI models become more powerful. The company is focusing on risk assessment, preventing misuse, and collaborating with the security community to improve overall cyber resilience.
AINeutralImport AI (Jack Clark) · Dec 86/106
🧠Facebook researchers propose developing 'co-improving AI' systems rather than self-improving AI, suggesting a collaborative approach to AI advancement. The Import AI newsletter also covers reinforcement learning developments and discusses potential user annoyance with AI content labels.
AIBullishOpenAI News · Dec 36/105
🧠OpenAI researchers are developing a 'confessions' method to train AI language models to acknowledge their mistakes and undesirable behavior. This approach aims to enhance AI honesty, transparency, and overall trustworthiness in model outputs.
AINeutralOpenAI News · Dec 15/104
🧠OpenAI is providing up to $2 million in research grants focused on AI and mental health applications. The funding program aims to support studies examining real-world risks, benefits, and safety implications of AI in mental health contexts.
AINeutralOpenAI News · Nov 256/104
🧠OpenAI is outlining its approach to handling mental health-related litigation cases involving ChatGPT. The company emphasizes handling sensitive cases with care, transparency, and respect while working to strengthen safety and support features in their AI platform.
AIBullishOpenAI News · Nov 196/108
🧠OpenAI is collaborating with independent experts to conduct third-party testing of their frontier AI systems. This external evaluation approach aims to strengthen safety measures, validate existing safeguards, and improve transparency in assessing AI model capabilities and associated risks.
AIBullishOpenAI News · Nov 136/107
🧠OpenAI is researching mechanistic interpretability through sparse neural network models to better understand AI reasoning processes. This approach aims to make AI systems more transparent and improve their safety and reliability.
AINeutralOpenAI News · Nov 126/103
🧠OpenAI has released a system card addendum for GPT-5.1 Instant and GPT-5.1 Thinking models, providing updated safety metrics and evaluations. The addendum includes new assessments focused on mental health considerations and potential emotional reliance issues with the advanced AI systems.
AINeutralOpenAI News · Nov 66/107
🧠OpenAI has introduced the Teen Safety Blueprint, a comprehensive framework designed to guide responsible AI development with specific protections for young users. The blueprint emphasizes age-appropriate design principles, built-in safeguards, and collaborative approaches to ensure AI systems protect and empower teenagers in digital environments.
AINeutralOpenAI News · Nov 65/106
🧠AI technology is advancing rapidly, presenting opportunities to guide its development toward beneficial outcomes. The focus is on steering AI progress toward scientific discovery, safety measures, and creating positive impacts for society.
AIBullishOpenAI News · Oct 296/106
🧠OpenAI has launched gpt-oss-safeguard, a new open-weight reasoning model designed for safety classification. The tool enables developers to implement and customize safety policies for their applications.
AINeutralOpenAI News · Oct 296/108
🧠GPT-OSS-Safeguard-120B and GPT-OSS-Safeguard-20B are new open-weight AI reasoning models designed to label content based on provided policies. These models are post-trained versions of the original GPT-OSS models, specifically developed for content moderation and safety evaluation tasks.
AINeutralOpenAI News · Oct 276/107
🧠OpenAI has released an addendum to GPT-5's system card detailing improvements in handling sensitive conversations. The update introduces new benchmarks for measuring emotional reliance, mental health interactions, and resistance to jailbreak attempts.
AIBullishOpenAI News · Oct 276/106
🧠OpenAI partnered with over 170 mental health experts to enhance ChatGPT's ability to handle sensitive conversations, improving distress recognition and empathetic responses. The collaboration resulted in up to 80% reduction in unsafe responses and better guidance toward real-world mental health support.
AINeutralGoogle DeepMind Blog · Oct 236/107
🧠An organization is enhancing its Frontier Safety Framework (FSF) to better identify and mitigate severe risks associated with advanced AI models. This represents ongoing efforts to strengthen AI safety protocols as models become more sophisticated.
AIBullishOpenAI News · Oct 146/106
🧠OpenAI has established a new Expert Council on Well-Being and AI, comprising psychologists, clinicians, and researchers to guide ChatGPT's support for emotional health, particularly for teenagers. The council's expertise will inform the development of safer and more empathetic AI experiences focused on mental wellness.
AINeutralOpenAI News · Oct 96/107
🧠OpenAI has developed new real-world testing methods to evaluate and reduce political bias in ChatGPT. These methods focus on improving objectivity in AI responses and establishing better bias measurement frameworks.
AINeutralOpenAI News · Oct 75/102
🧠OpenAI released its October 2025 report detailing efforts to detect and disrupt malicious uses of AI technology. The report covers the company's policy enforcement mechanisms and measures to protect users from AI-related harms.
AIBullishOpenAI News · Sep 265/108
🧠OpenAI has partnered with AARP to enhance online safety for older adults through AI training programs, scam detection tools, and educational initiatives. The collaboration will leverage OpenAI Academy and OATS's Senior Planet program to deliver nationwide digital literacy and cybersecurity education.
AINeutralOpenAI News · Sep 165/105
🧠OpenAI is developing age prediction technology and parental controls for ChatGPT to provide safer, age-appropriate interactions for teenage users. These new safety features aim to support families by creating more controlled AI experiences for younger users.
AIBullishOpenAI News · Sep 95/106
🧠SafetyKit is utilizing OpenAI's GPT-5 to improve content moderation and compliance enforcement capabilities. The system aims to deliver enhanced accuracy compared to traditional legacy safety systems through advanced AI integration.
AIBullishOpenAI News · Sep 26/105
🧠OpenAI announces new safety and user experience improvements for ChatGPT, including expert partnerships, enhanced parental controls for teen users, and routing sensitive conversations to more advanced reasoning models. These changes aim to make ChatGPT more helpful and safer across different user groups.
AINeutralOpenAI News · Aug 276/108
🧠OpenAI conducted a survey of over 1,000 people globally to gather public input on AI behavior standards and compared these responses to their Model Spec guidelines. The initiative represents OpenAI's effort toward collective alignment, aiming to incorporate diverse human values and perspectives into AI system defaults.
AIBullishOpenAI News · Aug 45/108
🧠OpenAI is enhancing ChatGPT with new features focused on user wellbeing, including improved support for difficult situations, break reminders, and better life advice capabilities. These improvements are being developed with guidance from expert input to help users thrive in various aspects of their lives.