Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution
Researchers introduce QD-LLM, a framework that evolves lightweight prompt embeddings (~32K parameters) to steer frozen large language models toward diverse outputs without fine-tuning. The approach outperforms existing quality-diversity optimization methods by 46.4% in coverage and demonstrates practical applications in test generation and training data improvement.
QD-LLM addresses a fundamental limitation in large language model deployment: mode collapse, where models produce homogeneous outputs despite the vast solution spaces they theoretically explore. By using parameter-efficient prompt embedding evolution through gradient-free optimization, the framework enables behavioral steering of frozen LLMs without computationally expensive fine-tuning. This approach separates model modification from behavior customization, allowing researchers to maintain model integrity while achieving diverse generation patterns.
The technical innovation combines semantic and explicit behavior characterization with formal coverage bounds, validated through near-independence testing (NMI = 0.08 Β± 0.02). The co-evolutionary variation operators enable targeted behavioral mutation via finite-difference gradient estimation, creating a sophisticated mechanism for exploring solution spaces efficiently. Across HumanEval, MBPP, and creative writing benchmarks, QD-LLM demonstrates substantial improvements over comparable methods with statistical significance (p<0.001 across 30 runs).
The practical implications extend beyond academic benchmarks. Diverse generation archives improve test generation by 34% through discovery of edge cases and enhance fine-tuning data quality, yielding 8.3% accuracy gains in downstream tasks. This suggests quality-diversity optimization can materially improve LLM reliability and robustness in production environments. Validation across multiple open-source models (Llama-3-70B, Mistral-Large) indicates the approach generalizes across architectures.
The framework bridges neuroevolution and modern LLMs by treating prompt embeddings as evolvable neural interfaces. This paradigm could enable organizations to customize LLM behavior efficiently without resource-intensive retraining, making advanced LLM applications more accessible to compute-constrained teams.
- βQD-LLM evolves tiny prompt embeddings (32K parameters) to steer 70B+ parameter frozen LLMs, eliminating fine-tuning needs while achieving diverse outputs.
- βThe framework achieved 46.4% higher coverage and 41.4% higher quality-diversity scores compared to existing methods with statistical significance (p<0.001).
- βDiverse generation archives from QD-LLM improved test generation by 34% more edge cases and fine-tuning accuracy by 8.3%.
- βHybrid behavior characterization combining semantic and explicit features with formal coverage bounds enables reliable measurement of output diversity.
- βThe approach generalizes across multiple open-source LLMs, establishing prompt embedding evolution as a practical paradigm for behavioral customization.