AINeutralarXiv โ CS AI ยท 3h ago7/10
๐ง
When Do Diffusion Models learn to Generate Multiple Objects?
Researchers have identified fundamental limitations in how text-to-image diffusion models handle multi-object generation, finding that scene complexity rather than data imbalance is the primary culprit. Through a controlled framework called MOSAIC, they demonstrate that counting objects is particularly difficult in low-data regimes and that compositional generalization collapses when training combinations are systematically excluded.