y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#machine-learning-safety News & Analysis

6 articles tagged with #machine-learning-safety. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles
AIBullisharXiv – CS AI · May 77/10
🧠

Geometry over Density: Few-Shot Cross-Domain OOD Detection

Researchers introduce UFCOD, a novel framework that enables out-of-distribution detection across arbitrary domains using a single pre-trained diffusion model and minimal inference-time samples. The approach achieves 93.7% average AUROC on cross-domain benchmarks with approximately 500× better sample efficiency than existing methods, requiring only ~100 unlabeled samples rather than 50k-163k training samples.

AIBullisharXiv – CS AI · Apr 137/10
🧠

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

Researchers introduce SafeAdapt, a novel framework for updating reinforcement learning policies while maintaining provable safety guarantees across changing environments. The approach uses a 'Rashomon set' to identify safe parameter regions and projects policy updates onto this certified space, addressing the critical challenge of deploying RL agents in safety-critical applications where dynamics and objectives evolve over time.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

Calibrating Uncertainty for Zero-Shot Adversarial CLIP

Researchers propose an adversarial fine-tuning method for CLIP that addresses a critical gap in zero-shot classification: while perturbations degrade accuracy, they also suppress uncertainty estimates, causing overconfidence. The approach reparameterizes CLIP outputs as Dirichlet distribution parameters to jointly optimize for robustness and calibrated uncertainty, achieving competitive results across benchmarks.

AIBullisharXiv – CS AI · 4d ago6/10
🧠

SAEmnesia: Erasing Concepts in Diffusion Models with Supervised Sparse Autoencoders

Researchers introduce SAEmnesia, a supervised sparse autoencoder framework that enables efficient concept unlearning in diffusion models by binding concepts to individual neurons. The method reduces computational overhead by 96.67% compared to existing approaches and achieves 9.22% improvement on benchmark tests, with demonstrated robustness against adversarial attacks.

AINeutralarXiv – CS AI · May 276/10
🧠

Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models

Researchers challenge the standard approach of using text embeddings as class prototypes in out-of-distribution detection with vision-language models, demonstrating a fundamental misalignment between text and visual feature spaces. They propose an online pseudo-supervised framework that learns visual prototypes directly from unlabeled test data, achieving state-of-the-art OOD detection performance.

AIBullisharXiv – CS AI · Mar 35/106
🧠

Learning to Explore: Policy-Guided Outlier Synthesis for Graph Out-of-Distribution Detection

Researchers propose PGOS (Policy-Guided Outlier Synthesis), a new framework that uses reinforcement learning to improve Graph Neural Network safety by better detecting out-of-distribution graphs. The system replaces static sampling methods with a learned exploration strategy that navigates low-density regions to generate pseudo-OOD graphs for enhanced detector training.