Quotient DAGs for Off-Policy Evaluation:Forward-Flow Importance Sampling and Exact Slate Propensities
Researchers introduce Quotient DAGs, a novel framework for off-policy evaluation that addresses variance issues in importance sampling by recognizing when generation process details are irrelevant to evaluation targets. The method computes exact unordered slate propensities efficiently through Forward-DP, a dynamic programming approach that avoids factorial enumeration, enabling practical evaluation for autoregressive slate recommendation systems.
This research addresses a fundamental computational and statistical challenge in off-policy evaluation, a critical technique for assessing policy performance without costly live experiments. Traditional importance sampling treats all generation process details equally, creating unnecessary variance when downstream rewards depend only on subset properties—a common scenario in recommendation systems where the order items are generated differs from how they're evaluated. The Quotient DAG framework elegantly solves this by merging equivalent histories and computing forward-flow ratios between target and behavior policies on a condensed graph structure. The Forward-DP algorithm is particularly significant for slate recommendation, where autoregressive generation produces ordered sequences but evaluation considers only unordered sets. Prior approaches required summing propensities across all possible orderings (factorial complexity), making exact computation intractable at scale. By operating on a subset-DAG instead, Forward-DP achieves polynomial complexity while maintaining exactness. This advancement enables practitioners to conduct reliable propensity-based model selection and evaluation for real-world recommender systems without approximation errors or prohibitive computational costs. The work bridges theory and practice by providing a principled primitive for systems that generate items sequentially but evaluate them as sets. For recommendation platforms, healthcare systems, and other domains requiring off-policy evaluation, this represents meaningful progress in reducing both computational overhead and statistical noise. The research demonstrates how recognizing problem structure—distinguishing relevant from nuisance variance—yields elegant algorithmic solutions that scale to practical applications.
- →Quotient DAGs merge evaluation-equivalent histories to reduce variance in importance sampling without sacrificing exactness.
- →Forward-DP computes exact unordered slate propensities in polynomial time, eliminating factorial enumeration bottlenecks.
- →The framework addresses the mismatch between autoregressive generation and set-based evaluation in recommendation systems.
- →Practical propensity-based model selection becomes feasible for production recommender systems using this primitive.
- →The approach generalizes beyond slate recommendation to any domain where generation process details exceed evaluation requirements.