🧠 AI⚪ NeutralImportance 4/10

Instruction-based Image Editing with Planning, Reasoning, and Generation

arXiv – CS AI|Liya Ji, Chenyang Qi, Qifeng Chen|February 27, 2026 at 05:00 AM|4 views

🤖AI Summary

Researchers propose a new multi-modality approach for instruction-based image editing that combines Chain-of-Thought planning, region reasoning, and generation capabilities. The method uses large language models and diffusion models to improve complex image editing tasks compared to existing single-modality approaches.

Key Takeaways

→New multi-modality framework separates instruction-based image editing into three components: CoT planning, region reasoning, and generation.
→Chain-of-Thought planning enables language models to reason appropriate sub-prompts for complex editing instructions.
→The approach trains an instruction-based editing region generation network with multi-modal large language models.
→A hint-guided editing network based on text-to-image diffusion models is proposed for final image generation.
→Experimental results show competitive performance on complex real-world image editing tasks.