AIBullisharXiv – CS AI · 14h ago7/10
🧠
Inferring Code Correctness from Specification
Researchers introduce TRAILS, a novel method for validating Large Language Model-generated code by grounding LLM reasoning in concrete input-output pairs derived from specifications. The approach demonstrates significant improvements in code correctness assessment, achieving up to 39% better performance than existing baselines while maintaining greater stability across multiple evaluation runs.