AIBearisharXiv – CS AI · 8h ago7/10
🧠
Measuring Behavior Portability in Large Language Models
A new research framework reveals that large language models exhibit inconsistent behavior across structurally equivalent decision environments, demonstrating significant portability losses when behavioral patterns learned in one setting are applied to another. The findings suggest that LLM evaluations based on single environments may be unreliable for predicting real-world autonomous decision-making performance.