#ai-safety News & Analysis

649 articles tagged with #ai-safety. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

649 articles

AIBullishOpenAI News · Apr 116/106

🧠

Announcing OpenAI’s Bug Bounty Program

OpenAI has launched a bug bounty program to enhance the security and reliability of their AI systems. The initiative seeks external help from security researchers to identify vulnerabilities as part of their commitment to developing safe and advanced AI technology.

AINeutralOpenAI News · Jan 116/105

🧠

Forecasting potential misuses of language models for disinformation campaigns and how to reduce risk

OpenAI researchers collaborated with Georgetown University and Stanford to investigate how large language models could be misused for disinformation campaigns. The year-long research culminated in a report that outlines threats to information environments and proposes mitigation frameworks.

AINeutralOpenAI News · Aug 246/107

🧠

Our approach to alignment research

An AI research organization outlines their approach to alignment research, focusing on improving AI systems' ability to learn from human feedback and assist in AI evaluation. Their ultimate goal is developing a sufficiently aligned AI system capable of solving all remaining AI alignment challenges.

AIBullishOpenAI News · Aug 105/108

🧠

New and improved content moderation tooling

OpenAI has launched a new and improved content moderation tool called the Moderation endpoint for API developers. The tool enhances their previous content filtering capabilities and is available for free to developers using the OpenAI API.

AINeutralOpenAI News · Jul 256/106

🧠

A hazard analysis framework for code synthesis large language models

The article presents a framework for analyzing potential hazards and risks associated with large language models that generate code. This research addresses growing concerns about AI-generated code safety and reliability as LLMs become more widely adopted for software development tasks.

AINeutralOpenAI News · Jul 185/106

🧠

Reducing bias and improving safety in DALL·E 2

OpenAI is implementing a new technique in DALL·E 2 to generate images of people that better reflect global population diversity. This update aims to reduce bias in the AI image generation system and improve safety standards.

AINeutralOpenAI News · Jun 285/103

🧠

DALL·E 2 pre-training mitigations

OpenAI implemented safety measures and guardrails during DALL·E 2's pre-training phase to mitigate risks associated with powerful AI image generation. These measures were designed to prevent the model from generating content that violates OpenAI's content policy before public release.

AIBullishOpenAI News · Jun 136/105

🧠

AI-written critiques help humans notice flaws

Researchers developed AI models that can identify and describe flaws in text summaries, helping human evaluators detect problems more effectively. Larger AI models showed better self-critique capabilities than summary-writing abilities, suggesting potential for AI-assisted supervision of AI systems.

AINeutralOpenAI News · May 285/104

🧠

Teaching models to express their uncertainty in words

The article title suggests coverage of research into teaching AI models to verbally express uncertainty, but no article content was provided for analysis. This represents a significant area of AI development focused on improving model transparency and reliability.

AINeutralOpenAI News · Mar 36/106

🧠

Lessons learned on language model safety and misuse

AI developers share their latest insights on language model safety and misuse prevention to help the broader AI development community. The article focuses on lessons learned from deployed models and strategies for addressing potential safety concerns and harmful applications.

AINeutralOpenAI News · Sep 235/105

🧠

Summarizing books with human feedback

This article discusses scaling human oversight of AI systems for tasks that are difficult to evaluate, specifically focusing on summarizing books with human feedback. The approach addresses the challenge of maintaining human control and evaluation in AI applications where traditional assessment methods may be insufficient.

AINeutralOpenAI News · Sep 85/108

🧠

TruthfulQA: Measuring how models mimic human falsehoods

The article title references TruthfulQA, a benchmark dataset designed to evaluate how AI language models reproduce human misconceptions and false beliefs. This appears to be focused on AI model evaluation and truthfulness measurement.

AINeutralLil'Log (Lilian Weng) · Mar 216/10

🧠

Reducing Toxicity in Language Models

Large pretrained language models acquire toxic behavior and biases from internet training data, creating safety challenges for real-world deployment. The article explores three key approaches to address this issue: improving training dataset collection, enhancing toxic content detection, and implementing model detoxification techniques.

AIBullishOpenAI News · Apr 166/105

🧠

Improving verifiability in AI development

A multi-stakeholder report by 58 co-authors across 30 organizations presents 10 mechanisms to improve verifiability of AI system claims. The tools enable developers to provide evidence of AI safety, security, fairness, and privacy while allowing users and policymakers to evaluate AI development processes.

AIBullishOpenAI News · Nov 216/105

🧠

Safety Gym

OpenAI has released Safety Gym, a comprehensive suite of environments and tools designed to measure and evaluate progress in developing reinforcement learning agents that can respect safety constraints during training. This release addresses a critical need in AI development for standardized safety evaluation metrics.

AINeutralOpenAI News · Sep 196/106

🧠

Fine-tuning GPT-2 from human preferences

OpenAI successfully fine-tuned a 774M parameter GPT-2 model using human feedback for tasks like summarization and text continuation. The research revealed challenges where human labelers' preferences didn't align with developers' intentions, with summarization models learning to copy text wholesale rather than generate original summaries.

AINeutralOpenAI News · Aug 206/104

🧠

GPT-2: 6-month follow-up

OpenAI released the 774 million parameter GPT-2 language model, completing their staged release approach that began with smaller models earlier in the year. The release includes an open-source legal agreement for model-sharing partnerships and a technical report on coordinating AI research publication norms.

AINeutralOpenAI News · Jul 106/107

🧠

Why responsible AI development needs cooperation on safety

A policy research paper outlines four strategies to improve AI industry cooperation on safety: communicating risks/benefits, technical collaboration, transparency, and incentivizing standards. The research highlights that competitive pressures could create collective action problems leading to under-investment in AI safety.

AIBullishOpenAI News · Mar 66/109

🧠

Introducing Activation Atlases

Researchers have developed activation atlases, a new technique for visualizing neural network interactions to better understand AI decision-making processes. This advancement aims to help identify weaknesses and investigate failures in AI systems as they are deployed in more sensitive applications.

AINeutralOpenAI News · Feb 196/105

🧠

AI safety needs social scientists

OpenAI researchers published a paper arguing that AI safety and alignment research requires social scientists to address human psychology, rationality, and biases. The company plans to hire social scientists full-time to collaborate with machine learning researchers on ensuring AI systems properly align with human values.

AINeutralOpenAI News · Oct 226/106

🧠

Learning complex goals with iterated amplification

Researchers propose iterated amplification, a new AI safety technique that allows specification of complex behaviors beyond human scale by demonstrating task decomposition rather than using labeled data or reward functions. The approach is in early experimental stages with testing limited to simple algorithmic domains, but shows potential as a scalable AI safety solution.

AIBullishOpenAI News · May 36/104

🧠

AI safety via debate

A new AI safety technique is proposed that involves training AI agents to debate topics with each other, with humans serving as judges to determine winners. This approach aims to improve AI safety through adversarial training and human oversight.

AINeutralOpenAI News · Feb 206/105

🧠

Preparing for malicious uses of AI

A collaborative research paper was published forecasting how malicious actors could misuse AI technology and proposing prevention and mitigation strategies. The year-long research effort involved multiple institutions including the Future of Humanity Institute, Centre for the Study of Existential Risk, and Electronic Frontier Foundation.

AINeutralOpenAI News · Aug 35/107

🧠

Gathering human feedback

RL-Teacher is an open-source implementation that enables AI training through occasional human feedback instead of traditional hand-crafted reward functions. This technique was developed as a step toward creating safer AI systems and addresses reinforcement learning challenges where rewards are difficult to specify.

AINeutralOpenAI News · Jun 216/107

🧠

Concrete AI safety problems

Researchers from multiple institutions including Google Brain, Berkeley, and Stanford have published a collaborative paper titled 'Concrete Problems in AI Safety.' The research explores various challenges in ensuring modern machine learning systems operate as intended and addresses safety considerations in AI development.

← PrevPage 25 of 26Next →