🧠 AI🟢 BullishImportance 6/10

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

arXiv – CS AI|Sanket Badhe, Priyanka Tiwari, Deep Shah|May 12, 2026 at 04:00 AM

🤖AI Summary

Researchers propose Semantic Softmax, a novel inference-time method that improves zero-shot LLM classification by recovering probability mass lost during constrained decoding. The approach aggregates scores from semantic synonyms, reducing calibration errors and boosting accuracy on emotion and toxicity detection tasks.

Analysis

Large language models face a fundamental challenge when adapted for zero-shot classification: standard constrained decoding discards probability assigned to semantic synonyms outside the target label set, creating what researchers term 'Renormalization Bias.' This phenomenon produces artificially inflated confidence scores and poor probability calibration, undermining model reliability in high-stakes applications. The Silent Vote mechanism identifies how linguistic information gets filtered away when softmax operations are restricted to narrow label spaces, leaving models overconfident in their predictions.

This research builds on growing recognition that LLM reliability extends beyond raw accuracy. As models become embedded in production systems—from content moderation to sentiment analysis—calibration becomes critical for risk assessment and threshold setting. Prior work established that zero-shot performance degrades significantly under distribution shift, yet few solutions addressed the fundamental probability redistribution problem at inference time.

Semantic Softmax directly tackles this by leveraging the semantic structure already embedded in model representations. By aggregating neighboring semantic concepts, the method preserves information discarded during standard decoding. Evaluation on GoEmotions and Civil Comments datasets shows consistent improvements across Expected Calibration Error, Brier Score, AUROC, and Macro-F1—indicating gains in both calibration and discrimination without architecture changes.

The approach has immediate practical implications for practitioners deploying LLMs in classification pipelines. Since it operates at inference time with no model retraining required, adoption barriers remain low. Future work should examine scalability across larger label sets and domain-specific semantic spaces, particularly for applications where miscalibration risks compound—financial risk assessment, medical diagnosis support, or legal document classification.

Key Takeaways

→Renormalization Bias causes LLMs to discard probability mass from semantic synonyms during constrained decoding, inflating false confidence
→Semantic Softmax recovers lost information by aggregating scores from semantic neighborhoods, improving both calibration and accuracy
→The method requires no model retraining and operates as an inference-time layer, enabling easy integration into existing pipelines
→Evaluation on emotion and toxicity datasets shows consistent improvements in Expected Calibration Error, Brier Score, AUROC, and Macro-F1
→Better calibrated zero-shot classifiers reduce deployment risks in high-stakes applications requiring reliable confidence estimates

#llm-calibration #zero-shot-learning #semantic-softmax #model-reliability #inference-optimization #constraint-decoding #natural-language-processing #classification

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AI5d ago

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AI6d ago

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AI6d ago

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge