🧠 AI⚪ NeutralImportance 6/10

Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild

arXiv – CS AI|Rebecca M. M. Hicke, Kiran Tomlinson|May 29, 2026 at 04:00 AM

🤖AI Summary

Researchers analyzed 12,000 Microsoft Bing Copilot users over time and found that individual user behavior with LLMs remains remarkably consistent despite broader population-level trends, with significant variation between active and casual users. The study reveals that existing datasets like WildChat-4.8M predominantly represent power users and fail to capture typical user-AI interactions.

Analysis

This longitudinal study addresses a critical gap in AI research by examining how individual users actually change their engagement patterns with large language models over extended periods. Rather than treating user behavior as a static snapshot, the researchers tracked conversational trajectories to understand behavioral persistence and adaptation. The findings challenge assumptions underlying many AI training datasets and user behavior models. The research reveals a paradox: while population-level statistics show clear trends in LLM usage, individual users demonstrate sticky habits that resist change. This distinction matters significantly for product development and AI alignment research, as interventions designed to modify user behavior may face fundamental limitations rooted in behavioral persistence. The stark differences between power users and casual users—where active users accomplish more complex, professionally-oriented tasks—highlight substantial heterogeneity often invisible in aggregate statistics. This heterogeneity directly impacts how AI systems should be optimized and evaluated. The critical caveat about WildChat-4.8M's skew toward power users has direct implications for downstream machine learning applications trained on this data. Models trained on power-user conversations may develop biases that misalign with how typical users actually interact with AI systems. The research underscores the importance of representative datasets and warns against generalizing from heavily-skewed samples. For AI development and deployment, understanding true user heterogeneity enables more targeted feature design, better safety considerations, and more accurate performance expectations across different user segments.

Key Takeaways

→Individual user behavior with LLMs is highly persistent despite population-level trend shifts, suggesting behavioral adaptation is difficult to achieve.
→Power users employ AI for complex, professional tasks while casual users show lower success rates and different use patterns.
→WildChat-4.8M dataset significantly overrepresents experienced power users, creating bias in models trained on this data.
→User heterogeneity remains substantial but largely invisible in aggregate statistics, affecting AI product design decisions.
→Longitudinal analysis reveals user behavior patterns invisible in static snapshot studies, requiring different research methodologies.

Mentioned in AI

Companies

Microsoft→