DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures
Researchers introduce DEFault++, an AI diagnostic system that automatically detects, categorizes, and identifies root causes of faults in transformer neural networks across 45 different failure mechanisms. The tool achieves over 96% accuracy in fault detection and demonstrates practical value in helping developers fix issues correctly 46% more often than without assistance.
Transformer models power most modern AI systems, yet their internal failures often occur silently without triggering observable errors, creating blind spots in production deployments. DEFault++ addresses this critical reliability gap by introducing a hierarchical diagnostic framework that moves beyond generic neural network fault detection to identify transformer-specific component failures. The system uses runtime behavioral measurements organized through a Fault Propagation Graph to trace how errors cascade through attention mechanisms, projections, and other architectural elements.
The development of DEFault-bench, a benchmark containing 3,739 systematically generated faulty transformer instances across seven models and nine tasks, establishes the first rigorous evaluation foundation for transformer fault diagnosis. This addresses a significant gap where existing debugging tools treat transformers as generic deep networks, missing the unique failure modes of attention mechanisms and multi-head architectures. The use of prototype matching and supervised contrastive learning enables interpretable diagnoses rather than black-box predictions.
For AI development teams and organizations deploying transformers in critical applications, DEFault++ reduces the time and expertise required to diagnose production failures. The developer study showing improvement from 57.1% to 83.3% accuracy in choosing repair actions demonstrates tangible value beyond academic metrics. This work signals growing maturity in AI reliability engineering, paralleling quality assurance practices that became standard in traditional software development decades ago.
As transformer deployment expands across financial systems, healthcare, and autonomous applications, diagnostic tools like DEFault++ may become essential infrastructure. Future work likely involves integrating such systems into continuous monitoring pipelines and extending fault diagnosis to vision transformers and other emerging architectures.
- →DEFault++ detects and diagnoses faults in transformer models across 12 categories and 45 underlying mechanisms with AUROC above 0.96
- →Developer study shows the tool improves repair action accuracy by 26.2 percentage points compared to manual diagnosis
- →DEFault-bench provides the first large-scale labeled dataset of transformer faults for training and evaluating diagnostic systems
- →The diagnostic approach uses Fault Propagation Graphs to trace how errors cascade through transformer architecture
- →System achieves Macro-F1 scores of 0.85 for both fault categorization and root-cause identification on encoder-decoder models