GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees
Researchers introduce GF-Score, a framework that evaluates neural network robustness across individual classes while measuring fairness disparities, eliminating the need for expensive adversarial attacks through self-calibration. Testing across 22 models reveals consistent vulnerability patterns and shows that more robust models paradoxically exhibit greater class-level fairness disparities.
GF-Score addresses a critical gap in adversarial robustness evaluation by decomposing aggregate robustness metrics into per-class profiles, revealing hidden vulnerabilities that standard benchmarks obscure. Traditional certified robustness scores provide single aggregate numbers that mask how protection varies across different data classes, potentially creating safety blind spots in deployed systems. This framework employs welfare economics metrics—Robustness Disparity Index, Normalized Robustness Gini Coefficient, and Worst-Case Class Robustness—to quantify these disparities systematically.
The technical innovation lies in eliminating expensive adversarial attack requirements through a self-calibration procedure that tunes temperature parameters using clean accuracy correlations alone. This substantially reduces computational overhead while maintaining evaluation rigor. Testing across RobustBench models reveals counterintuitive findings: certain classes like "cat" show vulnerability in 76% of CIFAR-10 models, and robustness improvements sometimes increase class-level disparity rather than narrowing it.
For AI practitioners deploying neural networks in safety-critical applications—autonomous vehicles, medical imaging, biometric systems—this framework provides actionable diagnostics. Organizations can now identify which subpopulations or data classes their security guarantees fail to protect equally, addressing potential liability and fairness concerns. The attack-free auditing pipeline makes large-scale robustness assessment practical for production systems. These findings suggest that optimizing aggregate robustness metrics may inadvertently sacrifice protection for minority classes, requiring developers to explicitly incorporate fairness constraints into model training.
- →GF-Score decomposes certified robustness into per-class profiles using fairness metrics from welfare economics to identify hidden vulnerability patterns.
- →Self-calibration eliminates expensive adversarial attack requirements, making robustness evaluation computationally practical for deployed systems.
- →Empirical testing reveals consistent class-level vulnerabilities across models, with certain classes like 'cat' showing weakness in 76% of CIFAR-10 models.
- →More robust models paradoxically exhibit greater class-level disparity, suggesting aggregate robustness optimization may sacrifice fairness.
- →The framework enables attack-free auditing of safety-critical AI systems, with code released openly on GitHub for reproducible evaluation.