Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation
Researchers benchmarked 43 large language models used for academic scholar recommendations, revealing that prompt design significantly affects recommendation quality and diversity. The study found that model choice, persona prompting (language, location, role), and context variables independently shape which scholars are recommended, with geographic location prompts producing the most variation in factuality and representativeness across disciplines.
This research addresses a critical gap in AI auditing by examining how Large Language Models function as gatekeepers in academic discovery. While LLMs are increasingly deployed to identify experts and shape academic discourse, previous audits failed to isolate which factors—model architecture versus prompt design—drive output variability. The benchmark's systematic testing across 43 models and multiple variables provides empirical evidence that prompt engineering is not merely a cosmetic adjustment but a fundamental lever affecting recommendation quality.
The findings carry important implications for institutions relying on LLM-based scholar discovery systems. Geographic location in persona prompts produced striking disparities: South African prompts generated less factual recommendations while Japanese prompts yielded highly factual but homogeneous lists biased toward prolific scholars. This suggests that seemingly neutral prompt design choices embed geographic and cultural biases that distort academic visibility. The research indicates that factuality and diversity often trade off against each other—optimizing for accuracy may inadvertently narrow the pool of visible scholars.
For the broader AI industry, this work demonstrates that auditing AI systems requires examining multiple dimensions of variability simultaneously. Organizations deploying LLMs for decision-making roles must recognize that prompt engineering decisions have measurable social consequences. The study's multi-disciplinary approach across six fields strengthens its applicability beyond academia. As LLMs continue influencing hiring, grants, and reputation systems, systematic auditing of prompt design effects becomes essential infrastructure for fair AI deployment. Future work should develop standardized persona auditing frameworks and establish guidelines for transparent documentation of prompt engineering decisions.
- →Prompt design significantly influences LLM-based scholar recommendations independent of model choice, affecting factuality, diversity, and geographic representation.
- →Geographic location in persona prompts creates measurable biases, with South African prompts yielding less factual results and Japanese prompts showing homogeneity bias.
- →Model selection drives basic technical quality while context variables (field, seniority) primarily determine parity and fairness in recommendations.
- →The factuality-diversity trade-off suggests that optimizing for recommendation accuracy may inadvertently reduce scholarly diversity and visibility.
- →Systematic auditing of prompt engineering effects should become standard practice for organizations deploying LLMs in gatekeeping roles like expert discovery.