🧠 AI⚪ NeutralImportance 6/10

When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference

arXiv – CS AI|Yasushi Sakai, Allen Song, Kent Larson|June 9, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce Propagational Proxy Voting (PPV), an unsupervised aggregation method for multi-sample LLM inference that outperforms standard majority voting on MMLU-Pro benchmarks by leveraging semantic entropy and reasoning geometry signals. The method achieves +1.5 percentage point overall improvement and +2.24 pp on difficult questions without requiring labeled data or auxiliary training.

Analysis

This research addresses a fundamental inefficiency in how large language models aggregate multiple sampled outputs. Traditional majority voting treats each sample as a binary vote, discarding rich information about model confidence and reasoning consistency. The PPV approach captures two previously ignored signals: within-sample semantic entropy (how confidently a model expresses its answer) and between-sample geometric coherence (whether reasoning paths align in embedding space). The method partitions 128 samples into 16 groups, computing semantic entropy and embedding centroids to construct a stochastic delegation matrix that dynamically weights voter influence based on these signals. Statistically significant improvements on MMLU-Pro (p ~ 1.0e-14) demonstrate the approach's robustness. The research demonstrates a practical scenario where PPV overturns a 10-6 majority vote by recognizing that the minority cluster exhibits geometric coherence (+0.26 cosine similarity) while the majority cluster is incoherent (-0.02), indicating the minority reasoning is more internally consistent. This finding has implications for production LLM systems where inference-time sampling is computationally expensive. Better aggregation methods directly improve cost-efficiency by extracting more signal from each forward pass. The negative results—showing that confidence-based ensemble methods cannot close the gap to oracle performance—help establish principled boundaries for unsupervised aggregation research. This work is particularly relevant as practitioners increasingly use multi-sample inference to improve LLM reliability without fine-tuning, making aggregation method efficiency a key competitive advantage.

Key Takeaways

→PPV improves upon majority voting by +1.5-2.24 percentage points on MMLU-Pro by incorporating semantic entropy and reasoning geometry signals
→The method requires no labeled data, auxiliary training, or external models, making it practical for deployment in existing inference pipelines
→Geometric incoherence of majority clusters can indicate incorrect consensus, revealing cases where minority reasoning is more internally consistent
→Research identifies fundamental limits: confidence-based modes cannot fully close the gap to oracle performance in unsupervised aggregation
→The approach extracts additional signal from existing multi-sample inference computations, improving cost-efficiency without additional forward passes

#llm-inference #aggregation-methods #majority-voting #semantic-entropy #mmlu-benchmark #unsupervised-learning #reasoning-geometry #model-ensemble

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge