🧠 AI⚪ NeutralImportance 5/10

Evaluating LLM Alignment With Human Trust Models

arXiv – CS AI|Anushka Debnath, Stephen Cranefield, Bastin Tony Roy Savarimuthu, Emiliano Lorini|March 9, 2026 at 04:00 AM

🤖AI Summary

Researchers analyzed how the GPT-J-6B language model internally represents and reasons about trust by comparing its embeddings to established human trust models. The study found that the AI's trust representation most closely aligns with the Castelfranchi socio-cognitive model, suggesting LLMs encode social concepts in meaningful ways.

Key Takeaways

→GPT-J-6B's internal trust representation aligns most closely with the Castelfranchi socio-cognitive model of human trust.
→The research used contrastive prompting to analyze trust-related embedding vectors in the AI model's activation space.
→LLMs appear to encode socio-cognitive constructs in ways that enable meaningful comparative analysis with human models.
→The findings could inform the design of more effective human-AI collaborative systems.
→This white-box analysis approach provides insights into how AI systems conceptualize interpersonal relationships.