🧠 AI⚪ NeutralImportance 6/10

Multi-View Decompilation for LLM-Based Malware Classification

arXiv – CS AI|Bercan Turkmen, Vyas Raina|June 19, 2026 at 04:00 AM

🤖AI Summary

Researchers demonstrate that using multiple decompilers (Ghidra and RetDec) with large language models improves malware classification accuracy compared to single-decompiler approaches. By providing complementary pseudo-C views of the same binary, the multi-view strategy increases recall on malicious samples without requiring additional training, offering a practical enhancement for LLM-based malware triage.

Analysis

This research addresses a critical vulnerability in current LLM-based malware detection pipelines: over-reliance on a single decompiler output. Decompilers are inherently lossy tools that apply heuristics to reconstruct source code from binaries, meaning each implementation makes different trade-offs in accuracy and clarity. The study's core finding—that ensemble decompiler views improve malicious-class F1 scores primarily through better recall—reflects a fundamental principle in machine learning: diverse, partially independent signals often outperform individual sources.

The work emerges from growing adoption of LLMs in cybersecurity workflows, where these models have shown promise assisting human analysts in code review and threat assessment. Previous research validated LLM capability on decompiled code, but assumed single-decompiler pipelines were sufficient. This research reveals that assumption breaks down under realistic conditions where detection accuracy matters for security operations.

For security teams and organizations deploying LLM-assisted malware analysis, this finding has immediate practical value. The approach requires no fine-tuning or retraining—simply feeding both Ghidra and RetDec outputs to existing LLMs achieves measurable improvements. This low-friction implementation path increases likelihood of adoption in production environments.

The complementary error patterns between decompilers suggest opportunities for future research: weighted ensemble methods, decompiler-specific prompting strategies, or dynamic decompiler selection based on binary characteristics could further optimize performance. As malware analysts increasingly leverage LLMs, understanding these multi-modal fusion approaches becomes essential for maintaining detection efficacy against evolving threats.

Key Takeaways

→Multi-decompiler prompting improves LLM malware classification F1 scores without model retraining
→Ghidra and RetDec expose different binary artifacts, providing complementary evidence for malware detection
→Increased recall on malicious samples indicates better identification of actual threats in practical deployments
→The approach is training-free and immediately deployable in existing security analysis pipelines
→Ensemble decompiler analysis demonstrates that decompiler choice materially impacts LLM-based threat detection