AIBearisharXiv – CS AI · 3h ago7/10
🧠
Paraphrase Brittleness in Production Retrieval-Augmented Commercial Recommendation: Reproducibility Below the Rerun-Stability Baseline
Research reveals that AI recommendation systems exhibit severe brittleness when processing paraphrased queries, with recommendation-set similarity dropping to 0.288 for cosmetic rewordings and 0.135 for constraint-modified queries—far below the 0.50-0.61 baseline for identical prompts. This undermines the reliability of AI visibility tracking metrics used in commercial recommendation optimization, as brand mention frequency depends more on prompt phrasing than actual model behavior.
🏢 OpenAI🏢 Anthropic