MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings
Researchers propose MM-tau-pยฒ, a new benchmark for evaluating multi-modal AI agents that adapt to user personas in customer service settings. The framework introduces 12 novel metrics to assess robustness and performance of LLM-based agents using voice and visual inputs, showing limitations even in advanced models like GPT-4 and GPT-5.