#ab-testing News & Analysis

3 articles tagged with #ab-testing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBullisharXiv – CS AI · Mar 37/103

🧠

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

Meta presents CharacterFlywheel, an iterative process for improving large language models in production social chat applications across Instagram, WhatsApp, and Messenger. Starting from LLaMA 3.1, the system achieved significant improvements through 15 generations of refinement, with the best models showing up to 8.8% improvement in engagement breadth and 19.4% in engagement depth while substantially improving instruction following capabilities.

AIBullisharXiv – CS AI · Mar 37/109

🧠

SimAB: Simulating A/B Tests with Persona-Conditioned AI Agents for Rapid Design Evaluation

SimAB is a new system that uses persona-conditioned AI agents to simulate A/B tests for rapid design evaluation without requiring real user traffic. The system achieved 67% overall accuracy against 47 historical A/B tests, rising to 83% for high-confidence cases, potentially transforming how companies validate design decisions.

AIBullisharXiv – CS AI · Feb 276/106

🧠

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

Apple's App Store search team successfully implemented LLM-generated textual relevance labels to augment their ranking system, addressing data scarcity issues. A fine-tuned specialized model outperformed larger pre-trained models, generating millions of labels that improved search relevance. This resulted in a statistically significant 0.24% increase in conversion rates in worldwide A/B testing.