SUSD: Structured Unsupervised Skill Discovery through State Factorization
SUSD introduces a novel unsupervised skill discovery framework that factorizes state space into independent components to learn diverse, dynamic skills without extrinsic rewards. By allocating distinct skill variables to different environmental factors and using a dynamic model to guide exploration, SUSD achieves superior performance in discovering complex, compositional behaviors compared to existing MI-based and distance-maximizing approaches.
SUSD addresses a fundamental limitation in unsupervised skill discovery research: existing methods like mutual information maximization tend to converge on simple, static skills that lack real-world applicability. Traditional MI-based approaches suffer from invariance properties that inadvertently penalize dynamic behaviors, while distance-maximizing alternatives still fail to engage all controllable environmental factors comprehensively. This research matters because discovering rich skill representations without supervision directly impacts the efficiency and scalability of reinforcement learning systems across robotics, autonomous agents, and game environments.
The paper's core innovation leverages the compositional structure inherent in most real-world environments. By factorizing state spaces into independent components—such as individual objects or controllable entities—SUSD enables fine-grained allocation of skill variables to specific factors. A dynamic model tracking learning progress across these factors creates an adaptive mechanism that automatically redirects agent focus toward underexplored areas, preventing premature convergence to suboptimal skill sets. This structured approach generates disentangled representations that naturally decompose complex behaviors into manageable components.
For the AI and reinforcement learning community, SUSD's framework offers significant practical advantages. The resulting factorized skill representation directly enables hierarchical reinforcement learning (HRL) on downstream tasks by providing compositional building blocks. Experiments across environments with 1-10 factors demonstrate marked improvements over baseline methods, suggesting scalability potential. The public code release accelerates adoption and enables reproducibility.
Future developments should explore how SUSD scales to higher-dimensional factor spaces, whether the approach generalizes across diverse domain types, and how factorized representations transfer between related tasks. Integration with reward-based training pipelines could unlock hybrid learning paradigms combining unsupervised discovery with targeted optimization.
- →SUSD factorizes state spaces into independent components to discover diverse, dynamic skills without extrinsic rewards.
- →The framework outperforms existing MI-based and distance-maximizing unsupervised skill discovery methods in factorized environments.
- →Disentangled skill representations enable efficient hierarchical reinforcement learning on compositional downstream tasks.
- →Adaptive dynamic modeling automatically guides agent exploration toward underexplored environmental factors.
- →Experimental validation across multiple environments demonstrates scalability from 1 to 10 factors with public code availability.