y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-calibration News & Analysis

6 articles tagged with #model-calibration. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles
AIBullisharXiv โ€“ CS AI ยท 6d ago7/10
๐Ÿง 

Variational Visual Question Answering for Uncertainty-Aware Selective Prediction

Researchers demonstrate that variational Bayesian methods significantly improve Vision Language Models' reliability for Visual Question Answering tasks by enabling selective prediction with reduced hallucinations and overconfidence. The proposed Variational VQA approach shows particular strength at low error tolerances and offers a practical path to making large multimodal models safer without proportional computational costs.

AIBullisharXiv โ€“ CS AI ยท Apr 137/10
๐Ÿง 

Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation

Researchers propose Evidential Transformation Network (ETN), a lightweight post-hoc module that converts pretrained models into evidential models for uncertainty estimation without retraining. ETN operates in logit space using sample-dependent affine transformations and Dirichlet distributions, demonstrating improved uncertainty quantification across vision and language benchmarks with minimal computational overhead.

AINeutralarXiv โ€“ CS AI ยท Apr 136/10
๐Ÿง 

VOLTA: The Surprising Ineffectiveness of Auxiliary Losses for Calibrated Deep Learning

Researchers introduce VOLTA, a simplified deep learning approach for uncertainty quantification that outperforms ten established baselines including ensemble methods and MC Dropout. The method achieves superior calibration with expected calibration error of 0.010 and competitive accuracy across multiple datasets, suggesting that complex auxiliary losses may be unnecessary for reliable uncertainty estimation in safety-critical applications.

AIBullisharXiv โ€“ CS AI ยท Apr 106/10
๐Ÿง 

Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

Researchers demonstrate that Large Language Models used as judges suffer from score range bias, where evaluation outputs are highly sensitive to predefined scoring scales. Using contrastive decoding techniques, they achieve up to 11.7% improvement in alignment with human judgments across different score ranges.

AINeutralarXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

Researchers introduce a new framework to evaluate how well Large Language Models understand their own knowledge limitations, finding that traditional confidence metrics miss key differences between models. The study reveals that models showing similar accuracy can have vastly different metacognitive abilities - their capacity to know what they don't know.

๐Ÿง  Llama
AINeutralarXiv โ€“ CS AI ยท Mar 35/104
๐Ÿง 

Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

Researchers developed a conformal prediction framework for Large Language Models used in medical entity extraction, testing on FDA drug labels and radiology reports. The study found that model calibration varies significantly across clinical domains, with models being underconfident on structured data but overconfident on free-text reports, achieving 90% target coverage with 9-13% rejection rates.