SOHET: Sequence Of Heterogeneous Events Transformer with Self-Supervised Pre-Training
Researchers introduce SOHET, a transformer-based architecture for processing heterogeneous event streams with self-supervised pre-training capabilities. The model demonstrates significant performance improvements on fraud detection and sequential prediction tasks, outperforming existing methods by 5.8% on a large-scale benchmark while achieving faster convergence.
SOHET addresses a fundamental challenge in machine learning: processing diverse, timestamped events that vary in type and structure. The architecture combines event-specific encoders with temporal embeddings and transformer mechanisms, enabling both causal (real-time) and bidirectional (retrospective) analysis. This flexibility matters because real-world applications require different modes—fraud detection systems need causal predictions as transactions occur, while batch analysis can leverage complete temporal context.
The research builds on growing recognition that self-supervised pre-training substantially improves model performance on downstream tasks. By introducing three pre-training objectives designed for causal settings, the authors demonstrate that heterogeneous event sequences benefit from representation learning before task-specific fine-tuning. The 2.6% performance gain and 2.4% faster convergence from pre-training validate this approach.
The practical significance emerges from the Booking.com fraud detection results, where SOHET handles 17 event types across real-world transaction data. Financial services, healthcare, and security systems all depend on heterogeneous event stream analysis, making architectural improvements directly applicable to high-stakes domains. SOHET's ability to outperform specialized baselines (FlexTPP, NAPPT, CIPPT) suggests a general-purpose advantage.
Looking forward, adoption depends on implementation accessibility and computational efficiency in production environments. The EBES benchmark results indicate strong generalization, but practitioners need clarity on inference latency, memory requirements, and integration with existing event streaming infrastructure. Open-sourcing the code would accelerate adoption across fraud detection, anomaly detection, and predictive maintenance applications.
- →SOHET achieves 5.8% performance improvement over existing methods on large-scale fraud detection with 17 event types
- →Self-supervised pre-training adds 2.6% gain and reduces convergence time by 2.4%, demonstrating representation learning benefits
- →Architecture handles both causal (real-time) and bidirectional (batch) prediction modes, enabling flexible deployment scenarios
- →Matches or exceeds published best results on 6 of 8 EBES benchmark tasks, indicating strong generalization capability
- →Directly applicable to fraud detection, financial risk assessment, and other heterogeneous event stream applications