y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference

arXiv – CS AI|Yasushi Sakai, Allen Song, Kent Larson|
🤖AI Summary

Researchers introduce Propagational Proxy Voting (PPV), an unsupervised aggregation method for multi-sample LLM inference that outperforms standard majority voting on MMLU-Pro benchmarks by leveraging semantic entropy and reasoning geometry signals. The method achieves +1.5 percentage point overall improvement and +2.24 pp on difficult questions without requiring labeled data or auxiliary training.

Analysis

This research addresses a fundamental inefficiency in how large language models aggregate multiple sampled outputs. Traditional majority voting treats each sample as a binary vote, discarding rich information about model confidence and reasoning consistency. The PPV approach captures two previously ignored signals: within-sample semantic entropy (how confidently a model expresses its answer) and between-sample geometric coherence (whether reasoning paths align in embedding space). The method partitions 128 samples into 16 groups, computing semantic entropy and embedding centroids to construct a stochastic delegation matrix that dynamically weights voter influence based on these signals. Statistically significant improvements on MMLU-Pro (p ~ 1.0e-14) demonstrate the approach's robustness. The research demonstrates a practical scenario where PPV overturns a 10-6 majority vote by recognizing that the minority cluster exhibits geometric coherence (+0.26 cosine similarity) while the majority cluster is incoherent (-0.02), indicating the minority reasoning is more internally consistent. This finding has implications for production LLM systems where inference-time sampling is computationally expensive. Better aggregation methods directly improve cost-efficiency by extracting more signal from each forward pass. The negative results—showing that confidence-based ensemble methods cannot close the gap to oracle performance—help establish principled boundaries for unsupervised aggregation research. This work is particularly relevant as practitioners increasingly use multi-sample inference to improve LLM reliability without fine-tuning, making aggregation method efficiency a key competitive advantage.

Key Takeaways
  • PPV improves upon majority voting by +1.5-2.24 percentage points on MMLU-Pro by incorporating semantic entropy and reasoning geometry signals
  • The method requires no labeled data, auxiliary training, or external models, making it practical for deployment in existing inference pipelines
  • Geometric incoherence of majority clusters can indicate incorrect consensus, revealing cases where minority reasoning is more internally consistent
  • Research identifies fundamental limits: confidence-based modes cannot fully close the gap to oracle performance in unsupervised aggregation
  • The approach extracts additional signal from existing multi-sample inference computations, improving cost-efficiency without additional forward passes
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles