Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models
Researchers introduce UniTok, a universal tokenizer that converts continuous time series data into discrete tokens, enabling UniTok-FM—a foundation model pretrained via next-token prediction. This unified approach supports forecasting, generation, and classification tasks without task-specific modifications, achieving competitive performance with specialized models while enabling zero-shot and few-shot inference capabilities.
UniTok addresses a fundamental challenge in applying large language model architectures to time series data: how to discretize unbounded continuous signals while preserving temporal structure and patterns. The tokenizer uses vector quantization with prefix normalization and progressive-resolution causal architecture, allowing standard LLM training objectives to work on sequential time data. This represents a meaningful convergence between two previously separate domains.
The significance lies in democratizing time series AI. Prior work required task-specific foundation models for forecasting, generation, or classification—each with distinct architectures and training procedures. UniTok-FM eliminates this fragmentation by performing next-token prediction on context windows containing multiple related time series, capturing shared dynamics rather than isolated patterns. This contextual pretraining approach mirrors successful LLM strategies but applies them to temporal data across domains.
For practitioners and enterprises, this enables practical deployment advantages: zero-shot forecasting reduces labeling requirements, in-context learning allows adaptation without retraining, and unified models lower infrastructure complexity. The approach achieves competitive performance with specialized baselines while maintaining generality—a challenging balance typically resolved through task-specific engineering.
The implications extend beyond pure performance metrics. As time series applications proliferate across finance, IoT, energy, and healthcare, standardized foundation models reduce barriers to implementation. The method's architecture-agnostic design means improvements in LLM efficiency directly benefit time series applications. Future work likely involves scaling these models and exploring multimodal inputs combining time series with textual or event-based data.
- →UniTok converts continuous time series into discrete tokens using vector quantization with scale-stabilization techniques, enabling standard LLM training approaches.
- →A single foundation model handles forecasting, generation, and classification tasks without task-specific modifications, reducing development complexity.
- →The model supports zero-shot and few-shot inference through in-context learning, eliminating the need for fine-tuning on new time series datasets.
- →Pretraining on context windows of related series captures shared temporal dynamics, outperforming isolated series training strategies.
- →Unified architecture achieves competitive performance with specialized foundation models while maintaining generality across multiple downstream tasks.