AIBullisharXiv – CS AI · 9h ago7/10
🧠
Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
Researchers have developed an automated pipeline using dual-LLM agents to generate high-quality training data for code translation tasks, particularly in low-resource languages like Fortran and CUDA. The approach produces verified translations with unit tests and multi-turn dialogue datasets, enabling a 7B model to outperform larger proprietary systems with over 56% improvement in functional correctness on C++-to-CUDA translation.