AINeutralarXiv – CS AI · 6h ago6/10
🧠
Evaluating Prompting and Execution-Based Methods for Deterministic Computation in LLMs
Researchers systematically evaluated multiple prompting strategies for LLMs on deterministic computation tasks, finding that standard methods like Chain-of-Thought achieve only moderate accuracy while Program-of-Thought (PoT) and specialized models achieve perfect accuracy by delegating computation to external tools. The study demonstrates that LLMs simulate reasoning patterns rather than reliably performing exact symbolic computation, suggesting hybrid approaches combining LLMs with external executors provide more reliable solutions for deterministic tasks.