#human-ai-alignment News & Analysis

11 articles tagged with #human-ai-alignment. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

11 articles

AIBullisharXiv – CS AI · 4d ago7/10

🧠

Aligning LLMs with Human Uncertainty: A Beta-Bernoulli Calibrator for LLM Forecasting

Researchers propose the Beta-Bernoulli Calibrator (BBC), a novel method that improves large language model forecasting by converting point estimates into probability distributions using both binary outcomes and aggregated human forecast signals. The approach demonstrates better calibration and accuracy than existing post-hoc methods while leveraging epistemic uncertainty as a more reliable error predictor than verbalized confidence.

AINeutralarXiv – CS AI · Mar 56/10

🧠

Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

Researchers developed automated methods to discover biases in Large Language Models when used as judges, analyzing over 27,000 paired responses. The study found LLMs exhibit systematic biases including preference for refusing sensitive requests more than humans, favoring concrete and empathetic responses, and showing bias against certain legal guidance.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization

Researchers introduce 'Behavioral Specification,' a compressed interpretive layer that captures user preferences more accurately than raw data or extracted facts, achieving 25x context reduction while improving AI alignment on interpretation-heavy tasks. The work establishes 'representational accuracy' as a distinct metric from recall, demonstrating that faithful user representation is critical for human-AI alignment across diverse populations.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

Researchers compared how large language models rate the interestingness of math problems against human judgments from college students and International Math Olympiad competitors. While LLMs show broad agreement with humans, they fail to match the distribution of human preferences and poorly explain why problems are interesting, though they can generate novel engaging problems after validity filtering.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Reasoning and Planning with Dynamically Changing Norms

Researchers present a novel framework enabling AI agents to understand and follow dynamically changing human norms during planning and decision-making. The work introduces a defeasible calculus to resolve normative conflicts and demonstrates the approach through an AI agent called SocialBot on natural language dialogue tasks, advancing the field of norm-guided AI planning in human-AI interaction contexts.

AIBullisharXiv – CS AI · May 116/10

🧠

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

Researchers compared frontier Large Reasoning Models (LRMs) with traditional AI systems using human gameplay data paired with fMRI brain recordings. LRMs demonstrated superior alignment with human learning behavior and predicted brain activity an order of magnitude better than reinforcement learning alternatives, suggesting they more closely mirror human cognition during complex decision-making.

AIBullisharXiv – CS AI · Feb 276/107

🧠

Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility

Researchers have identified 'modal difference vectors' in language models that can distinguish between possible, impossible, and nonsensical statements, revealing better modal categorization abilities than previously thought. The study shows these vectors emerge consistently as models become more capable and can even predict human judgment patterns about event plausibility.

AINeutralarXiv – CS AI · Apr 64/10

🧠

Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting

Research reveals that large language models can reproduce the qualitative structure of human social reasoning but struggle with quantitative magnitude calibration. Pragmatic prompting strategies that consider speaker knowledge and motives can improve this calibration, though fine-grained accuracy remains partially unresolved.

AIBullisharXiv – CS AI · Mar 175/10

🧠

Human-like Object Grouping in Self-supervised Vision Transformers

Researchers developed a behavioral benchmark showing that self-supervised vision transformers, particularly those trained with DINO objectives, align closely with human object perception and segmentation behavior. The study found that models with stronger object-centric representations better predict human visual judgments, with Gram matrix structure playing a key role in perceptual alignment.

AINeutralarXiv – CS AI · Mar 54/10

🧠

MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations

Researchers have released MuSaG, the first German multimodal sarcasm detection dataset featuring 33 minutes of annotated television content with text, audio, and video data. The study reveals a significant gap between human sarcasm detection (which relies heavily on audio cues) and current AI models (which perform best with text).

AINeutralarXiv – CS AI · Mar 34/105

🧠

Emerging Human-like Strategies for Semantic Memory Foraging in Large Language Models

Researchers analyzed how Large Language Models access semantic memory using the Semantic Fluency Task, finding that LLMs exhibit similar memory foraging patterns to humans. The study reveals convergent and divergent search strategies in LLMs that mirror human cognitive behavior, potentially enabling better human-AI alignment or productive cognitive disalignment.