PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation
PersonaDrive introduces a retrieval-augmented vision-language-action (VLA) system that enables autonomous driving agents to exhibit diverse human-like behavioral styles in simulation environments. Using demonstrations from human drivers instructed to drive aggressively, neutrally, or conservatively, the system achieves superior performance on driving benchmarks while allowing style selection without per-style retraining.
PersonaDrive addresses a critical gap in autonomous driving simulation: the ability to generate realistic, behaviorally diverse traffic agents that reflect how humans actually drive under different conditions. Traditional simulation environments populate non-ego vehicles with homogeneous rule-based or single-mode learned behaviors, limiting the realism of testing scenarios. This work moves beyond proxy signals like post-hoc labels or LLM-inferred rewards by directly leveraging explicit human demonstrations collected through a driver-in-the-loop setup where participants drove CARLA routes under specific behavioral instructions.
The technical approach elegantly separates style learning from the core driving model through a three-stage pipeline: offline triplet mining identifies behaviorally relevant examples, a lightweight retrieval head learns to match situations to appropriate demonstrations, and a frozen VLA backbone uses retrieved context for in-context learning. This architecture delivers practical benefits—style changes require only database swaps at inference, eliminating expensive per-style retraining.
For the autonomous driving industry, PersonaDrive's results carry meaningful implications. The 4.6% improvement over SimLingo and consistent performance across behavioral styles suggests that demonstration-based learning generalizes better than proxy signals. The ability to generate diverse agents with measurable behavioral variations—18% speed increase and 25% acceleration rise from conservative to aggressive driving—enables more thorough safety validation across realistic driving scenarios.
Looking forward, the framework's modular design suggests broader applications beyond style variation. Researchers may extend this to include regional driving differences, weather-dependent behaviors, or vehicle-type-specific driving patterns. The success of in-context learning for driving control could also influence how future autonomous systems adapt to novel environments.
- →PersonaDrive uses retrieved human driving demonstrations to condition VLA agents on specific behavioral styles without per-style retraining
- →The system outperforms prior baselines on Bench2Drive with 4.6% improvement over SimLingo across all behavioral styles
- →Human-instructed driving dataset enables realistic style variation where conservative and aggressive agents show 18% speed and 25% acceleration differences
- →Retrieval-augmented approach separates style learning from core driving model, improving both generalization and computational efficiency
- →Demonstration-based behavioral conditioning outperforms proxy signals like LLM-inferred rewards for creating human-like traffic simulation agents