Researchers empirically test whether host intrusion detection systems trained on syscall traces can generalize across different CVE exploits within the same Common Weakness Enumeration class. Results show CWE-level generalization works for some weakness families (achieving F1=0.6976 for authentication flaws) but fails for others, with cross-CVE transfer heavily dependent on source profile breadth rather than weakness classification.
This academic research addresses a critical gap between how intrusion detection systems are developed and how they operate in real-world security environments. While HIDS systems are typically trained and tested against individual CVE instances in controlled settings, defenders in production environments face the distinct challenge of recognizing novel exploits belonging to known vulnerability classes. The researchers' systematic evaluation using the LID-DS-2021 dataset and three CWE families reveals important limitations in current detection methodologies.
The findings demonstrate significant variability in generalization capability across different weakness types. The authentication vulnerability class (CWE-307) achieved respectable detection performance when combining multiple CVEs, while SQL injection and file upload vulnerabilities showed near-complete breakdown in cross-CVE scenarios. This asymmetry suggests that some vulnerability classes have more consistent exploitation patterns in their syscall signatures, while others exhibit greater behavioral diversity.
A particularly valuable contribution is the emphasis on calibrated false positive rates as a methodological requirement. Many security research papers omit this calibration, potentially reporting artificially inflated detection metrics that don't translate to operational settings. The researchers' finding that transfer effectiveness depends more on the breadth of the source normal profile than on CWE labels challenges conventional assumptions about vulnerability classification hierarchies.
For security practitioners and tool developers, this research indicates that CWE-level model sharing and reuse requires empirical validation per weakness category rather than assumptions based on classification schemes. Organizations deploying HIDS systems should demand honest reporting of calibrated performance metrics and avoid overconfidence in cross-CVE detection claims without domain-specific validation.
- βCWE-level generalization in HIDS is empirically possible for some but not all vulnerability classes, requiring case-by-case validation.
- βCross-CVE transfer performance depends primarily on source profile breadth rather than shared CWE classification.
- βCalibrated false positive rates are essential for honest metric reporting in intrusion detection research.
- βAuthentication vulnerabilities (CWE-307) showed superior generalization compared to injection and upload weakness classes.
- βCurrent syscall-based feature sets have inherent limitations in capturing consistent behavioral signatures across exploit variants.