#safety-critical-ai News & Analysis

8 articles tagged with #safety-critical-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AIBullisharXiv – CS AI · 15h ago7/10

🧠

Bridging Control with Neural Network Verifier alpha-beta-CROWN: A Tutorial

Researchers present alpha-beta-CROWN, a neural network verification framework that enables formal verification of learning-based controllers in safety-critical systems. The tool addresses scalability challenges in verifying controller properties like stability and safety by computing certified bounds on nonlinear functions and using GPU parallelization for complex verification tasks.

AINeutralarXiv – CS AI · Apr 137/10

🧠

Medical Reasoning with Large Language Models: A Survey and MR-Bench

Researchers present a comprehensive survey of medical reasoning in large language models, introducing MR-Bench, a clinical benchmark derived from real hospital data. The study reveals a significant performance gap between exam-style tasks and authentic clinical decision-making, highlighting that robust medical reasoning requires more than factual recall in safety-critical healthcare applications.

AIBullisharXiv – CS AI · Apr 137/10

🧠

Neural Distribution Prior for LiDAR Out-of-Distribution Detection

Researchers propose Neural Distribution Prior (NDP), a framework that significantly improves LiDAR-based out-of-distribution detection for autonomous driving by modeling prediction distributions and adaptively reweighting OOD scores. The approach achieves a 10x performance improvement over previous methods on benchmark tests, addressing critical safety challenges in open-world autonomous vehicle perception.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Towards provable probabilistic safety for scalable embodied AI systems

Researchers propose a shift from deterministic to probabilistic safety verification for embodied AI systems, arguing that provable probabilistic guarantees offer a more practical path to large-scale deployment in safety-critical applications like autonomous vehicles and robotics than the infeasible goal of absolute safety across all scenarios.

AINeutralarXiv – CS AI · May 76/10

🧠

LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy

Researchers propose Adaptive Conformal Semantic Entropy (ACSE), a novel method for quantifying uncertainty in large language model outputs by measuring semantic diversity rather than relying solely on lexical or probabilistic measures. The approach uses conformal calibration to provide statistical guarantees on error rates, demonstrating significant performance improvements over existing uncertainty quantification baselines.

AINeutralarXiv – CS AI · Apr 156/10

🧠

DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant

The first LLM Testing competition at ICSE 2026's DeepTest workshop evaluated four tools designed to benchmark an LLM-based automotive assistant, focusing on their ability to identify failure cases where the system fails to surface critical safety warnings from car manuals. The competition assessed both the effectiveness of test discovery and the diversity of identified failures, establishing a benchmark for evaluating AI testing methodologies in safety-critical applications.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Towards Reasonable Concept Bottleneck Models

Researchers introduce CREAM (Concept Reasoning Models), an advanced framework for Concept Bottleneck Models that allows explicit encoding of concept relationships and concept-to-task mappings. The model maintains interpretability while achieving competitive performance even with incomplete concept sets through an optional side-channel, addressing a key limitation in explainable AI systems.

AINeutralarXiv – CS AI · Apr 106/10

🧠

SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems

SymptomWise introduces a deterministic reasoning framework that separates language understanding from diagnostic inference in AI-driven medical systems, combining expert-curated knowledge with constrained LLM use to improve reliability and reduce hallucinations. The system achieved 88% accuracy in placing correct diagnoses in top-five differentials on challenging pediatric neurology cases, demonstrating how structured approaches can enhance AI safety in critical domains.