TrojanGYM: A Detector-in-the-Loop LLM for Adaptive RTL Hardware Trojan Insertion
Researchers introduce TrojanGYM, an LLM-driven framework that automatically generates hardware Trojans to expose vulnerabilities in detection systems. The system demonstrates that existing detectors can be evaded at rates up to 83.33%, revealing critical gaps in hardware security testing methodologies.
TrojanGYM addresses a fundamental asymmetry in hardware security: detection systems trained on limited, stylized benchmarks fail against novel attack patterns. By deploying multiple LLM agents (GPT-4, LLaMA-3.3-70B, Gemini-2.5Pro) in an adversarial feedback loop with GNN-based detectors, the researchers simulate how attackers might discover and exploit blind spots in deployed security tools. This detector-in-the-loop approach mirrors red-teaming methodologies gaining traction across AI safety, but applies them to semiconductor supply chain security.
The findings underscore a critical vulnerability in modern hardware verification pipelines. Existing benchmarks like TrustHub appear sufficient when tested in isolation, yet TrojanGYM-generated Trojans achieve up to 83.33% evasion rates, indicating detectors optimize for specific trigger patterns rather than learning generalizable security properties. The introduction of Robust-GNN4TJ improves detection from 0% to 60% on adversarial benchmarks, yet substantial gaps remain.
This research has immediate implications for semiconductor manufacturers, defense contractors, and cloud providers relying on GNN-based hardware intrusion detection. It suggests current hardware security certifications may provide false confidence. The systematic exposure of detector vulnerabilities creates pressure to adopt more adversarial testing methods before deployment. The planned release of code and artifacts will likely accelerate adoption of TrojanGYM-style benchmarking across the industry, potentially reshaping hardware verification standards.
- βLLM-driven hardware Trojan generation exposes major blind spots in existing detection systems, with evasion rates reaching 83.33%
- βCurrent hardware security benchmarks are insufficient; detectors overfit to narrow patterns rather than learning generalizable security properties
- βRobust-GNN4TJ improves detection of adversarial Trojans from 0% to 60%, but significant gaps persist
- βDetector-in-the-loop feedback mechanisms provide a systematic methodology for discovering and patching hardware security vulnerabilities
- βSupply chain security for semiconductors requires immediate adoption of adversarial testing practices before current certifications can be trusted