y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#safety-critical-ai News & Analysis

16 articles tagged with #safety-critical-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

16 articles
AIBearisharXiv – CS AI · Jun 27/10
🧠

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

Researchers introduce PaSBench-Video, a 740-video benchmark designed to evaluate multimodal large language models' ability to issue timely safety warnings in streaming video scenarios. Testing 13 MLLMs reveals that no model exceeds 20% accuracy on strict metrics, with models struggling to distinguish emerging hazards from routine activities, particularly in driving scenarios where safe and dangerous scenes appear visually similar.

AIBearisharXiv – CS AI · Jun 27/10
🧠

InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning

Researchers introduced InPhyRe, a new benchmark showing that large multimodal models (LMMs) struggle with inductive physical reasoning—their ability to apply learned physical laws to novel, unseen scenarios. Testing 13 LMMs revealed critical weaknesses: models fail to generalize parametric knowledge, perform poorly with unseen physical laws, and exhibit language bias that causes them to ignore visual inputs, raising concerns about their reliability for safety-critical applications.

AIBullisharXiv – CS AI · May 277/10
🧠

Bridging Control with Neural Network Verifier alpha-beta-CROWN: A Tutorial

Researchers present alpha-beta-CROWN, a neural network verification framework that enables formal verification of learning-based controllers in safety-critical systems. The tool addresses scalability challenges in verifying controller properties like stability and safety by computing certified bounds on nonlinear functions and using GPU parallelization for complex verification tasks.

AINeutralarXiv – CS AI · Apr 137/10
🧠

Medical Reasoning with Large Language Models: A Survey and MR-Bench

Researchers present a comprehensive survey of medical reasoning in large language models, introducing MR-Bench, a clinical benchmark derived from real hospital data. The study reveals a significant performance gap between exam-style tasks and authentic clinical decision-making, highlighting that robust medical reasoning requires more than factual recall in safety-critical healthcare applications.

AIBullisharXiv – CS AI · Apr 137/10
🧠

Neural Distribution Prior for LiDAR Out-of-Distribution Detection

Researchers propose Neural Distribution Prior (NDP), a framework that significantly improves LiDAR-based out-of-distribution detection for autonomous driving by modeling prediction distributions and adaptively reweighting OOD scores. The approach achieves a 10x performance improvement over previous methods on benchmark tests, addressing critical safety challenges in open-world autonomous vehicle perception.

AIBullisharXiv – CS AI · Apr 107/10
🧠

Towards provable probabilistic safety for scalable embodied AI systems

Researchers propose a shift from deterministic to probabilistic safety verification for embodied AI systems, arguing that provable probabilistic guarantees offer a more practical path to large-scale deployment in safety-critical applications like autonomous vehicles and robotics than the infeasible goal of absolute safety across all scenarios.

AINeutralarXiv – CS AI · Jun 46/10
🧠

Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees

Researchers propose a method to guarantee safety in reinforcement learning agents by using variational autoencoders and dual optimization to construct probabilistic barrier-certificates that identify safe versus unsafe behavior regions. The approach tightens safety bounds by targeting unexplored state-space regions during training, enabling deployment of RL systems with verified safety guarantees.

AIBullisharXiv – CS AI · Jun 26/10
🧠

DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance

DeepIPCv3 is a novel autonomous driving framework that combines LiDAR and Dynamic Vision Sensor (DVS) data using transformer-based cross-modal attention to improve pedestrian collision avoidance. The system addresses critical safety gaps in frame-based perception by leveraging microsecond-level event streams, achieving state-of-the-art performance in sudden crossing scenarios.

AIBullisharXiv – CS AI · Jun 26/10
🧠

Train, Test, Re-evaluate: Schedule-Sensitive Evaluation of Generative Data for Hand Detection

Researchers demonstrate that synthetic data generated through inpainting can effectively augment hand detection models for safety-critical applications when trained using multi-stage scheduling approaches. The study shows that combining real and synthetic data with strategic fine-tuning improves detection accuracy on out-of-distribution scenarios like gloved hands, addressing a critical gap in occupational safety systems.

AINeutralarXiv – CS AI · Jun 26/10
🧠

Probabilistic Performance Guarantees for Multi-Task Reinforcement Learning

Researchers present a new theoretical framework for multi-task reinforcement learning that computes high-confidence performance guarantees on unseen tasks by combining per-task confidence bounds with task-level generalization. The approach addresses a critical gap in deploying RL policies in safety-critical applications where formal performance assurances are essential.

AIBullisharXiv – CS AI · May 286/10
🧠

Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment

Researchers propose a reinforcement learning framework that enables safer and more efficient transfer of AI agents from simulation to real-world deployment by using probabilistic latent embeddings and dynamic policy adaptation. The approach addresses the critical sim-to-real gap problem in cyber-physical systems like autonomous vehicles by inferring environment context and adjusting risk levels during deployment.

AIBullisharXiv – CS AI · May 286/10
🧠

Delay-Aware Reinforcement Learning for Highway On-Ramp Merging under Stochastic Communication Latency

Researchers introduce DAROM, a reinforcement learning framework designed to handle stochastic communication delays in autonomous vehicle highway merging scenarios. The system uses a delay-aware encoder to maintain decision-making performance despite V2I transmission latencies up to 2.0 seconds, achieving over 99% success rates in high-density traffic conditions.

AINeutralarXiv – CS AI · May 76/10
🧠

LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy

Researchers propose Adaptive Conformal Semantic Entropy (ACSE), a novel method for quantifying uncertainty in large language model outputs by measuring semantic diversity rather than relying solely on lexical or probabilistic measures. The approach uses conformal calibration to provide statistical guarantees on error rates, demonstrating significant performance improvements over existing uncertainty quantification baselines.

AINeutralarXiv – CS AI · Apr 156/10
🧠

DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant

The first LLM Testing competition at ICSE 2026's DeepTest workshop evaluated four tools designed to benchmark an LLM-based automotive assistant, focusing on their ability to identify failure cases where the system fails to surface critical safety warnings from car manuals. The competition assessed both the effectiveness of test discovery and the diversity of identified failures, establishing a benchmark for evaluating AI testing methodologies in safety-critical applications.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Towards Reasonable Concept Bottleneck Models

Researchers introduce CREAM (Concept Reasoning Models), an advanced framework for Concept Bottleneck Models that allows explicit encoding of concept relationships and concept-to-task mappings. The model maintains interpretability while achieving competitive performance even with incomplete concept sets through an optional side-channel, addressing a key limitation in explainable AI systems.

AINeutralarXiv – CS AI · Apr 106/10
🧠

SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems

SymptomWise introduces a deterministic reasoning framework that separates language understanding from diagnostic inference in AI-driven medical systems, combining expert-curated knowledge with constrained LLM use to improve reliability and reduce hallucinations. The system achieved 88% accuracy in placing correct diagnoses in top-five differentials on challenging pediatric neurology cases, demonstrating how structured approaches can enhance AI safety in critical domains.