TokenMinds: Pretrained User Tokens and Embeddings for User Understanding in Large Recommender Systems
Google researchers introduce TokenMinds, a system that generates both discrete semantic ID tokens and dense embeddings for user modeling in large-scale recommender systems. Deployed across YouTube's services handling billions of users, the approach demonstrates that semantically grounded user tokens complement traditional dense embeddings while reducing computational overhead through shared vocabulary across different content formats.
TokenMinds addresses a fundamental challenge in modern recommendation systems: how to represent users in ways that are both semantically interpretable and computationally efficient at scale. Traditional dense embeddings, while effective, suffer from fixed-dimensional constraints that limit expressiveness, while existing token-based approaches using LLMs fail to ground representations in actual item attributes. The system extends prior work on Semantic ID (SID) based item tokenization to user modeling, leveraging an encoder-decoder architecture adapted from pre-trained language models to simultaneously produce discrete tokens and dense vectors.
The innovation gains significance through its industrial deployment across YouTube at production scale. By unifying long-form and short-form video behaviors into a single model vocabulary, TokenMinds reduces both training costs and serving infrastructure complexity—a critical advantage when managing systems serving billions of concurrent users. The asynchronous architecture decoupling representation generation from downstream scoring enables flexible integration with existing ranking pipelines without requiring wholesale system redesigns.
For the recommendation and AI infrastructure sectors, TokenMinds validates that hybrid representation approaches outperform single-modality solutions. The complementary benefits of discrete tokens (interpretability, semantic grounding) and dense embeddings (compatibility, nuanced similarity) suggest future systems will increasingly adopt multi-output architectures. This approach also reduces technical debt by avoiding forced migration away from proven dense embedding workflows.
The work demonstrates that token-based user modeling has matured beyond experimental status. Organizations building recommendation systems now have evidence that semantic discretization scales practically, which likely accelerates broader adoption of SID-based approaches across the industry beyond Google's ecosystem.
- →TokenMinds generates both semantic user tokens and dense embeddings simultaneously, providing complementary benefits for recommendation systems.
- →The system unifies long-form and short-form video behaviors through shared SID vocabulary, significantly reducing training and serving costs.
- →Industrial deployment across YouTube's full user traffic (billions of users) confirms practical viability of SID-based representations at production scale.
- →Hybrid token-plus-embedding approach enables seamless integration with existing ranking systems without architectural redesign.
- →Results validate that semantically grounded discrete tokens outperform traditional text-based LLM tokens while maintaining dense embedding compatibility.