AINeutralarXiv – CS AI · 3h ago6/10
🧠
When prompt perturbations break your A/B test: A valid statistical test for generative surveying
Researchers demonstrate that standard statistical hypothesis tests fail when applied to generative surveying, where LLM-based personas provide market research feedback. The study proposes a valid permutation test that accounts for prompt sensitivity and provides guidance on optimal resource allocation for this emerging research methodology.