#semantic-entropy News & Analysis

8 articles tagged with #semantic-entropy. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AIBullisharXiv – CS AI · Jun 197/10

🧠

Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM Inference

Researchers propose MACR, a novel framework that resolves conflicts between large language models' internal knowledge and external context information using multi-agent reasoning. The approach moves beyond binary choice paradigms to actively reconcile inconsistencies, demonstrating significant performance improvements over existing methods while providing interpretable conflict resolution.

AIBullisharXiv – CS AI · Jun 117/10

🧠

Automated Creativity Evaluation of Language Models Across Open-Ended Tasks

Researchers introduce an automated, domain-agnostic framework for evaluating creativity in large language models across open-ended tasks. The approach uses semantic entropy to measure divergent creativity and a multi-agent judge system for convergent creativity, validated across problem-solving, research ideation, and creative writing domains.

AIBullisharXiv – CS AI · Jun 97/10

🧠

FASE: Fast Adaptive Semantic Entropy for Code Quality

Researchers introduce FASE (Fast Adaptive Semantic Entropy), a novel metric for evaluating code quality in multi-agent AI systems that reduces computational costs by 99.7% while improving accuracy by 25% compared to existing semantic entropy methods. The approach uses structural and semantic dissimilarity graphs instead of expensive LLM-driven equivalence checks, offering practical uncertainty quantification for autonomous software development.

AIBullisharXiv – CS AI · Feb 277/106

🧠

Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy

Researchers introduce VALTEST, a framework that uses semantic entropy to automatically validate test cases generated by Large Language Models, addressing the problem of invalid or hallucinated tests that mislead AI programming agents. The system improves test validity by up to 29% and enhances code generation performance through better filtering of LLM-generated test cases.

AINeutralarXiv – CS AI · Jun 96/10

🧠

When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference

Researchers introduce Propagational Proxy Voting (PPV), an unsupervised aggregation method for multi-sample LLM inference that outperforms standard majority voting on MMLU-Pro benchmarks by leveraging semantic entropy and reasoning geometry signals. The method achieves +1.5 percentage point overall improvement and +2.24 pp on difficult questions without requiring labeled data or auxiliary training.

AINeutralarXiv – CS AI · Jun 96/10

🧠

BEACON: Behavioral Entropy Aggregation for Cross-Model Hallucination Detection in Large Language Models

Researchers introduce BEACON, a black-box hallucination detection framework for large language models that achieves 81.23% accuracy by analyzing model outputs without requiring internal access. The method combines multiple uncertainty signals including semantic entropy and consistency checks, outperforming existing baselines and offering practical deployment options across commercial LLM APIs.

AIBullisharXiv – CS AI · May 126/10

🧠

Active Testing of Large Language Models via Approximate Neyman Allocation

Researchers introduce a novel active testing algorithm that reduces evaluation costs for large language models by intelligently sampling from evaluation pools using semantic entropy and approximate Neyman allocation. The method achieves up to 28% MSE reduction over uniform sampling while saving an average of 22.9% of evaluation budget across multiple benchmarks.

AINeutralarXiv – CS AI · May 76/10

🧠

LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy

Researchers propose Adaptive Conformal Semantic Entropy (ACSE), a novel method for quantifying uncertainty in large language model outputs by measuring semantic diversity rather than relying solely on lexical or probabilistic measures. The approach uses conformal calibration to provide statistical guarantees on error rates, demonstrating significant performance improvements over existing uncertainty quantification baselines.