#ai-safety News & Analysis

649 articles tagged with #ai-safety. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

649 articles

AINeutralarXiv – CS AI · Apr 75/10

🧠

Automated Analysis of Global AI Safety Initiatives: A Taxonomy-Driven LLM Approach

Researchers developed an automated framework using large language models to compare AI safety policy documents across a shared taxonomy of activities. The study found that model choice significantly affects comparison outcomes, with some document pairs showing high disagreement across different LLMs, though human expert evaluation showed high inter-annotator agreement.

AINeutralarXiv – CS AI · Mar 265/10

🧠

Prototype Fusion: A Training-Free Multi-Layer Approach to OOD Detection

Researchers developed a new training-free approach for out-of-distribution (OOD) detection that uses multiple neural network layers instead of just the final layer. The method improves detection accuracy by up to 4.41% AUROC and reduces false positives by 13.58% across various architectures.

AINeutralarXiv – CS AI · Mar 174/10

🧠

Learning When to Trust in Contextual Bandits

Researchers propose CESA-LinUCB, a new approach to robust reinforcement learning that addresses 'Contextual Sycophancy' where evaluators are truthful in normal situations but biased in critical contexts. The method learns trust boundaries for each evaluator and achieves sublinear regret even when no evaluator is globally reliable.

AINeutralMarkTechPost · Mar 105/10

🧠

How to Build a Risk-Aware AI Agent with Internal Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Reliable Decision-Making

This tutorial demonstrates building an advanced AI agent system that incorporates risk-awareness through internal criticism, self-consistency reasoning, and uncertainty estimation. The system evaluates responses across multiple dimensions including accuracy, coherence, and safety while implementing risk-sensitive selection strategies for more reliable decision-making.

AINeutralarXiv – CS AI · Mar 54/10

🧠

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes

Research study examines how parents want to moderate their children's interactions with GenAI chatbots, revealing gaps in current parental control tools. The study used LLM-generated scenarios to identify that parents need more granular, personalized controls at the conversation level rather than broad content filtering.

AIBearisharXiv – CS AI · Mar 44/102

🧠

Slurry-as-a-Service: A Modest Proposal on Scalable Pluralistic Alignment for Nutrient Optimization

This is a satirical academic paper that critiques AI pluralistic alignment research by using the absurd metaphor of 'mulching' humans into nutrient slurry. The authors parody current AI ethics frameworks to highlight how technical approaches to value alignment can potentially enable harmful systems.

AIBullisharXiv – CS AI · Mar 35/105

🧠

Integrating LTL Constraints into PPO for Safe Reinforcement Learning

Researchers developed PPO-LTL, a new framework that integrates Linear Temporal Logic safety constraints into Proximal Policy Optimization for safer reinforcement learning. The system uses Büchi automata to monitor safety violations and converts them into penalty signals, showing reduced safety violations while maintaining competitive performance in robotics environments.

AINeutralOpenAI News · Sep 294/104

🧠

Introducing parental controls

OpenAI is introducing parental controls for ChatGPT along with a dedicated parent resource page to help families manage and guide their children's interactions with the AI assistant at home.

AINeutralHugging Face Blog · Apr 294/105

🧠

Welcoming Llama Guard 4 on Hugging Face Hub

Meta has released Llama Guard 4, a new AI safety model, now available on Hugging Face Hub. This represents Meta's continued investment in AI safety infrastructure and content moderation capabilities.

AINeutralHugging Face Blog · Apr 144/105

🧠

4M Models Scanned: Protect AI + Hugging Face 6 Months In

The article title suggests a 6-month collaboration between Protect AI and Hugging Face has resulted in scanning 4 million AI models. However, the article body appears to be empty, preventing detailed analysis of the partnership's findings or implications.

AINeutralOpenAI News · Aug 84/105

🧠

GPT-4o System Card External Testers Acknowledgements

This article appears to be an acknowledgements section for external testers who contributed to the GPT-4o system card. The content provided is limited to just the title and acknowledgements header without detailed information about the testing process or findings.

AINeutralLil'Log (Lilian Weng) · Jul 75/10

🧠

Extrinsic Hallucinations in LLMs

This article defines and categorizes hallucination in large language models, specifically focusing on extrinsic hallucination where model outputs are not grounded in world knowledge. The author distinguishes between in-context hallucination (inconsistent with provided context) and extrinsic hallucination (not verifiable by external knowledge), emphasizing that LLMs must be factual and acknowledge uncertainty to avoid fabricating information.

AINeutralOpenAI News · Apr 234/106

🧠

OpenAI’s commitment to child safety: adopting safety by design principles

The article appears to discuss OpenAI's approach to implementing safety by design principles specifically focused on child protection measures. However, the article body content was not provided, limiting detailed analysis of the specific safety measures and their implications.

AINeutralOpenAI News · Apr 54/104

🧠

Our approach to AI safety

An organization outlines their commitment to AI safety as a core component of their mission. The article emphasizes the critical importance of ensuring AI systems are built, deployed, and used safely.

AINeutralOpenAI News · Nov 214/103

🧠

Benchmarking safe exploration in deep reinforcement learning

The article title references benchmarking safe exploration techniques in deep reinforcement learning, which is a critical area of AI research focused on developing algorithms that can learn while avoiding harmful or dangerous actions. However, no article body content was provided for analysis.

AINeutralOpenAI News · May 34/106

🧠

Transfer of adversarial robustness between perturbation types

The article discusses research on adversarial robustness transfer between different types of perturbations in machine learning models. This research examines how defensive techniques developed for one type of attack may provide protection against other types of adversarial examples.

AINeutralOpenAI News · Dec 214/104

🧠

Faulty reward functions in the wild

This article explores a critical failure mode in reinforcement learning where algorithms break due to misspecified reward functions. The post examines how improper reward design can lead to unexpected and counterintuitive behaviors in AI systems.

AINeutralOpenAI News · Jun 204/105

🧠

OpenAI technical goals

OpenAI reiterates its core mission to develop safe artificial intelligence while ensuring AI benefits are distributed widely and equitably across global populations. This represents the company's foundational commitment to responsible AI development and democratization of AI technology access.

AINeutralHugging Face Blog · Sep 183/103

🧠

Democratizing AI Safety with RiskRubric.ai

The article title suggests coverage of RiskRubric.ai, a platform focused on democratizing AI safety. However, no article body content was provided for analysis, preventing detailed assessment of the platform's features, impact, or market implications.

AINeutralOpenAI News · Sep 123/103

🧠

OpenAI o1 System Card External Testers Acknowledgements

OpenAI has published acknowledgements for external testers who contributed to the o1 system card. This appears to be a formal recognition of individuals or organizations who helped test and validate OpenAI's o1 reasoning model during its development phase.

AINeutralHugging Face Blog · Feb 243/104

🧠

Red-Teaming Large Language Models

The article title suggests content about red-teaming large language models, which involves testing AI systems for vulnerabilities and potential risks. However, no article body content was provided for analysis.

AINeutralHugging Face Blog · Mar 211/106

🧠

Introducing the Chatbot Guardrails Arena

The article title suggests the introduction of a new system called 'Chatbot Guardrails Arena' but no article content was provided for analysis. Without the actual article body, it's impossible to determine the specific details, implications, or significance of this development.

AINeutralHugging Face Blog · Feb 231/103

🧠

Introducing the Red-Teaming Resistance Leaderboard

The article title suggests the introduction of a Red-Teaming Resistance Leaderboard, but no article body content was provided for analysis.

AINeutralHugging Face Blog · Feb 11/106

🧠

Constitutional AI with Open LLMs

The article title suggests a discussion of Constitutional AI implementation using open-source large language models, but no article body content was provided for analysis.

← PrevPage 26 of 26