AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
The Impact of LLM Self-Consistency and Reasoning Effort on Automated Scoring Accuracy and Cost
Researchers analyzing LLM-based automated scoring found that strategic model selection and reasoning configurations outperform ensemble methods for accuracy. Temperature sampling improved performance, but larger ensemble sizes showed diminishing returns, while higher reasoning effort correlated with better accuracy at varying cost-benefit ratios across model families.
๐ข OpenAI๐ง GPT-5๐ง Gemini