🧠 AI⚪ NeutralImportance 6/10

DiT-Reward: Generative Representations for Text-to-Image Reward Modeling

arXiv – CS AI|Yuanming Yang, Guoqing Ma, Bo Wang, Yuan Zhang, Wei Tang, Chenyi Li, Haoyang Huang, Nan Duan|June 23, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce DiT-Reward, a reward model derived from pretrained Diffusion Transformers that outperforms existing benchmarks like HPSv3 for evaluating text-to-image generation quality. The approach demonstrates that representations learned during generative model training transfer effectively to reward prediction tasks, achieving measurable improvements in preference prediction accuracy and inference speed.

Analysis

DiT-Reward represents a meaningful advancement in how researchers approach the evaluation problem in generative AI systems. Rather than building reward models from scratch, the team leverages existing pretrained diffusion transformers, extracting and aggregating their learned representations across multiple transformer layers. This transfer learning approach yields consistent performance gains across multiple preference benchmarks, with particularly strong results on HPDv2 (85.6%) and HPDv3 (77.6%), while also delivering a 1.65x inference speedup compared to HPSv3.

The broader context reveals an industry shift toward more efficient model development. As generative models grow larger and more capable, the downstream infrastructure supporting them—including reward models for preference optimization—becomes increasingly important. Traditional approaches like HPSv3 require separate training pipelines, but DiT-Reward's integration with existing generative backbones reduces computational overhead and complexity. The research demonstrates that probing across model depths, representations perform strongest in middle-to-late layers, providing insights into how generative knowledge organizes hierarchically.

For the AI and generative model ecosystem, this work has practical implications for improving output quality through policy optimization. The authors demonstrate real-world application by using DiT-Reward to optimize Stable Diffusion 3.5 Large with Flow-GRPO, showing particularly clear gains in realism metrics. The consistent positive scaling with backbone capacity suggests the approach remains viable as models continue growing. This efficiency gain matters for resource-constrained developers and enables more frequent model refinement cycles, ultimately accelerating iteration speed in generative AI development.

Key Takeaways

→DiT-Reward outperforms HPSv3 on all four evaluated preference benchmarks by leveraging pretrained generative transformer representations
→The approach achieves 1.65x faster inference than HPSv3 while maintaining comparable performance, reducing computational requirements for reward modeling
→Reward prediction performance concentrates in middle-to-late transformer layers, with benefits from aggregating representations across multiple depths
→Direct latent scoring integration demonstrates successful transfer of generative knowledge to downstream preference prediction tasks without full fine-tuning
→Real-world optimization of Stable Diffusion 3.5 Large shows measurable improvements in realism and other quality metrics using the new reward model

Mentioned in AI

Models

Stable DiffusionStability