🧠 AI⚪ NeutralImportance 6/10

Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model

arXiv – CS AI|Kihyuk Lee|April 14, 2026 at 04:00 AM

🤖AI Summary

A study evaluating the consistency of exercise prescriptions generated by Gemini 2.5 Flash found high semantic consistency but significant variability in quantitative components like exercise intensity. The research highlights that while LLMs produce semantically similar outputs, structural constraints and expert validation are necessary before clinical deployment.

Analysis

This research addresses a critical gap in understanding LLM reliability for healthcare applications. The study's repeated generation design—producing 120 outputs across six clinical scenarios—provides empirical evidence that LLMs exhibit inconsistent behavior even under identical input conditions, a phenomenon often overlooked in enthusiastic AI adoption discussions. The findings reveal a nuanced reliability profile: semantic similarity scores of 0.879-0.939 suggest LLMs maintain thematic coherence, yet 10-25% of resistance training outputs contained unclassifiable intensity expressions, directly undermining clinical usability.

This research contextualizes broader concerns about deploying LLMs in regulated healthcare environments. While the technology shows promise for generating personalized content at scale, the variability in quantitative prescriptions—the precise intensity, duration, or frequency specifications that clinicians require—exposes fundamental limitations in current models. The finding that safety expressions varied significantly despite being present in 100% of outputs demonstrates that inclusion and quality consistency differ substantially.

For healthcare providers and developers building AI-assisted clinical tools, this study provides actionable validation requirements. The emphasis on prompt structure's influence on consistency suggests that careful engineering can improve reliability, yet cannot eliminate variability entirely. The research effectively demonstrates that LLM outputs require systematic validation against clinical standards before patient-facing deployment. This positions expert validation not as optional enhancement but as mandatory infrastructure for healthcare AI systems. Organizations developing clinical decision-support tools should view this as evidence that governance frameworks and human oversight remain essential components of any LLM-based healthcare application.

Key Takeaways

→LLM-generated exercise prescriptions show high semantic consistency (0.879-0.939 cosine similarity) but significant variability in quantitative components like exercise intensity
→10-25% of resistance training outputs contained unclassifiable intensity expressions, indicating critical gaps for clinical deployment
→Safety expressions appeared in all outputs but varied significantly in frequency, revealing inconsistency between content inclusion and content quality
→Prompt structure substantially influences LLM consistency, suggesting that careful engineering can improve but not eliminate variability
→Expert validation and additional structural constraints are mandatory before deploying LLM-generated clinical prescriptions in healthcare settings

Mentioned in AI

Models

GeminiGoogle

#large-language-models #healthcare-ai #clinical-validation #llm-consistency #ai-reliability #exercise-prescription #ai-deployment #safety-validation

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge