y0news
← Feed
Back to feed
🤖 AI × Crypto NeutralImportance 7/10

CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering

arXiv – CS AI|Baicheng Chen, Yu Wang, Ziheng Zhou, Xiangru Liu, Juanru Li, Yilei Chen, Tianxing He|
🤖AI Summary

Researchers introduced CREBench, a benchmark to evaluate large language models' capabilities in cryptographic binary reverse engineering. The best-performing model (GPT-5.4) achieved 64.03% success rate, while human experts scored 92.19%, showing AI still lags behind human expertise in cryptographic analysis tasks.

Key Takeaways
  • CREBench benchmark comprises 432 challenges across 48 cryptographic algorithms and 3 difficulty levels to test LLM reverse engineering capabilities.
  • GPT-5.4 was the top-performing model, recovering flags in 59% of challenges with a score of 64.03 out of 100.
  • Human experts significantly outperformed AI models with a baseline score of 92.19 points.
  • The research addresses the systematic underexploration of LLMs' capabilities in cryptographic binary reverse engineering.
  • Results indicate substantial room for improvement in AI-assisted cryptographic security analysis and vulnerability discovery.
Mentioned in AI
Models
GPT-5OpenAI
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles