y0news
← Feed
Back to feed
🤖 AI × Crypto NeutralImportance 6/10

BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces

arXiv – CS AI|Liangwei Yang, Jielin Qiu, Zixiang Chen, Ming Zhu, Juntao Tan, Zhiwei Liu, Wenting Zhao, Zhujun Lan, Akshara Prabhakar, Silvio Savarese, Huan Wang, Shelby Heinecke|
🤖AI Summary

Researchers introduce BehaviorBench, a benchmark dataset for evaluating AI systems that predict user financial decisions using real-world behavioral data from prediction markets and blockchain records. The benchmark contains over 1.4 million trade instances and 141,000 belief predictions across 2,000 wallets, enabling more accurate assessment of personalized decision-modeling systems compared to simulation-based approaches.

Analysis

BehaviorBench addresses a critical gap in AI evaluation methodology by leveraging authentic on-chain and prediction market data instead of model-generated synthetic behavior. Prior research has demonstrated systematic divergence between simulated and human decision-making patterns, making real-world behavioral traces essential for developing trustworthy decision-support systems. This benchmark reconstructs complete wallet-level decision histories, providing granular transaction-level data that captures the complexity of actual user behavior in financial markets.

The dual-task structure—belief prediction and trade prediction—reflects the multifaceted nature of financial decision-making. Belief prediction measures whether systems can infer user confidence and market stance, while trade prediction tests the ability to forecast specific transaction behavior. Across frontier and open-weight language models, the researchers found that personalization improves performance inconsistently, with different models ranking differently depending on task type and evaluation metrics. This variation reveals that no single approach dominates across contexts.

For the broader AI and crypto communities, BehaviorBench establishes a rigorous evaluation framework for personalized systems operating in decentralized environments. As AI increasingly mediates financial decision-making, having realistic benchmarks prevents overestimating model capabilities during development. The disjoint support pools and multiple history interfaces expose different failure modes, helping researchers identify when models genuinely understand user behavior versus when they exploit spurious patterns.

Future work should expand BehaviorBench across different market types and user demographics, ensuring models generalize beyond prediction markets. Integration with DeFi protocols could enable real-time validation of these systems.

Key Takeaways
  • BehaviorBench provides 1.48 million real trade instances from actual blockchain and prediction market data, replacing unreliable synthetic user simulations.
  • Personalization improves belief prediction more consistently than trade prediction, suggesting current models struggle with granular transaction-level forecasting.
  • Model rankings shift across task layers and metrics, indicating no universal personalization approach for financial decision prediction.
  • Different history interfaces (raw history, generated profiles, retrieved evidence) expose distinct failure modes in personalized systems.
  • The benchmark establishes rigorous evaluation standards for AI systems supporting financial decisions in decentralized markets.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles