AINeutralarXiv โ CS AI ยท 4h ago6/10
๐ง
Designing Reliable LLM-Assisted Rubric Scoring for Constructed Responses: Evidence from Physics Exams
Researchers evaluated GPT-4o's ability to score physics exam responses using rubric-assisted scoring, finding that AI reliability matches human inter-rater consistency when rubrics are well-structured and granular. The study reveals that clear rubric design matters far more than LLM configuration choices, with performance declining on ambiguous mid-range responses.
๐ง GPT-4