#information-theory News & Analysis

35 articles tagged with #information-theory. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

35 articles

AIBullisharXiv – CS AI · 2d ago7/10

🧠

Estimating the Empowerment of Language Model Agents

Researchers propose EELMA, an algorithm that uses information-theoretic empowerment to evaluate language model agents at scale without manual benchmarking. The method measures an agent's ability to influence future states through its actions and demonstrates strong correlation with task performance across text-based, web, and tool-use environments.

AIBullisharXiv – CS AI · May 127/10

🧠

Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models

Researchers propose a self-captioning workflow with a Multimodal Interaction Gate to improve vision language models by amplifying redundant information between vision and text modalities. The approach addresses hallucination and robustness issues by converting unique modal interactions into shared redundancies, reducing visual-induced errors by 38.3% and improving consistency by 16.8%.

AIBearisharXiv – CS AI · May 97/10

🧠

Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Systems Perspective

Researchers propose a unified dynamical systems model of human-AI co-evolution, showing that increased reliance on LLMs creates feedback loops between human cognition, data quality, and model capability. The analysis identifies three regimes including a 'degenerative convergence' where over-reliance on AI leads to reduced diversity and an information bottleneck, suggesting AI trajectory depends as much on human behavioral dynamics as on model design.

AIBearisharXiv – CS AI · May 97/10

🧠

Large Vision-Language Models Get Lost in Attention

Researchers have identified a critical architectural flaw in large vision-language models: attention mechanisms are largely redundant and misallocate computational resources, with random attention weights performing comparably to learned ones. This finding challenges fundamental assumptions about Transformer design and suggests current LVLMs inefficiently process visual information despite their scale.

AINeutralarXiv – CS AI · May 77/10

🧠

The Reasoning Trap: An Information-Theoretic Bound on Closed-System Multi-Step LLM Reasoning

Researchers identify the 'Reasoning Trap,' a fundamental information-theoretic limitation where multi-agent language model debates preserve answer accuracy while degrading reasoning quality. The study introduces the Supported Faithfulness Score metric and Evidence-Grounded Socratic Reasoning framework, demonstrating that closed-system reasoning protocols following standard multi-agent debate structures inevitably lose information fidelity according to the Data Processing Inequality.

AINeutralarXiv – CS AI · Apr 107/10

🧠

Information as Structural Alignment: A Dynamical Theory of Continual Learning

Researchers introduce the Informational Buildup Framework (IBF), a new approach to continual learning that eliminates catastrophic forgetting by treating information as structural alignment rather than stored parameters. The framework demonstrates superior performance across multiple domains including chess and image classification, achieving near-zero forgetting without requiring raw data replay.

AIBearisharXiv – CS AI · Apr 77/10

🧠

Incompleteness of AI Safety Verification via Kolmogorov Complexity

Researchers prove a fundamental theoretical limit in AI safety verification using Kolmogorov complexity theory. They demonstrate that no finite formal verifier can certify all policy-compliant AI instances of arbitrarily high complexity, revealing intrinsic information-theoretic barriers beyond computational constraints.

AINeutralarXiv – CS AI · Mar 177/10

🧠

Uncertainty Quantification and Data Efficiency in AI: An Information-Theoretic Perspective

This research review examines methodologies for addressing AI systems' challenges with limited training data through uncertainty quantification and synthetic data augmentation. The paper presents formal approaches including Bayesian learning frameworks, information-theoretic bounds, and conformal prediction methods to improve AI performance in data-scarce environments like robotics and healthcare.

AIBullisharXiv – CS AI · Mar 167/10

🧠

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Researchers introduce a novel optimization framework that integrates the Minimum Description Length (MDL) principle directly into deep neural network training dynamics. The method uses geometrically-grounded cognitive manifolds with coupled Ricci flow to create autonomous model simplification while maintaining data fidelity, with theoretical guarantees for convergence and practical O(N log N) complexity.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Emergent Coordination in Multi-Agent Language Models

Researchers developed an information-theoretic framework to measure when multi-agent AI systems exhibit coordinated behavior beyond individual agents. The study found that specific prompt designs can transform collections of AI agents into coordinated collectives that mirror human group intelligence principles.

AINeutralarXiv – CS AI · Feb 277/106

🧠

Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs

Researchers identified a fundamental limitation in multimodal LLMs where decoders trained on text cannot effectively utilize non-text information like speaker identity or visual textures, despite this information being preserved through all model layers. The study demonstrates this 'modality collapse' is due to decoder design rather than encoding failures, with experiments showing targeted training can improve specific modality accessibility.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

Researchers provide the first rigorous theoretical analysis of temperature scaling, a widely-used technique for controlling uncertainty in machine learning models. The study reveals that while temperature scaling reliably increases entropy in classifiers, it does not necessarily increase diversity in large language models as commonly claimed, and establishes temperature scaling as the unique linear calibration method that preserves hard predictions.

AIBullisharXiv – CS AI · 3d ago6/10

🧠

Noise Scheduling as Information-Guided Allocation in Diffusion Training

Researchers introduce InfoNoise, an adaptive noise scheduling method for diffusion model training that dynamically reallocates computational resources toward the most informative denoising levels. By estimating conditional-entropy-rate profiles during training, the approach matches or exceeds fixed schedules on image benchmarks while achieving up to 3x computational efficiency gains on diverse tasks including DNA and language generation.

AINeutralarXiv – CS AI · May 126/10

🧠

Generalization Bounds of Emergent Communications for Agentic AI Networking

Researchers propose a novel emergent communication framework for 6G agentic AI networks that enables autonomous agents to learn their own communication protocols while accounting for physical networking constraints. The framework applies information-theoretic principles to quantify trade-offs between task-relevant information and computational complexity, with experimental validation showing improved generalization performance.

AINeutralarXiv – CS AI · May 126/10

🧠

The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?

Researchers discover that neural networks across different modalities (vision, point clouds, language) converge toward shared representations, with non-language modalities systematically moving toward language's neighborhood structure rather than vice versa. Using directional analysis, they attribute this asymmetry to language representations occupying more compact feature space, proposing that language serves as the asymptotic attractor in multimodal representation learning.

AINeutralarXiv – CS AI · May 126/10

🧠

Neural Information Causality

Researchers present Neural Information Causality (Neural-IC), a theoretical framework that formalizes how neural network representations function as communication channels under query-separated computation. The work establishes operational bounds on information leakage through bottlenecks and demonstrates that quantum advantages in specific architectures depend on fair query-conditioned access rather than total information capacity.

🏢 Meta

AINeutralarXiv – CS AI · May 126/10

🧠

Information Density as a Quantitative Measure for AI-enabled Virtual Sensing: Feasibility and Limits

Researchers propose Information Density as a quantitative framework for optimizing IoT sensor networks by enabling virtual sensing through AI. Using spatial, temporal, and cross-modal correlations, the system can replace physical sensors with computational models while maintaining sub-4% error margins, demonstrated via Madrid's smart city infrastructure.

AIBullisharXiv – CS AI · May 96/10

🧠

BALAR : A Bayesian Agentic Loop for Active Reasoning

Researchers introduced BALAR, a Bayesian algorithm that enables large language models to engage in structured multi-turn dialogue by actively reasoning about missing information and strategically asking clarifying questions. The system demonstrated significant performance improvements across three diverse benchmarks—14.6% to 38.5% higher accuracy—without requiring fine-tuning, suggesting a more principled approach to interactive AI reasoning.

AINeutralarXiv – CS AI · May 96/10

🧠

SANEmerg: An Emergent Communication Framework for Semantic-aware Agentic AI Networking

SANEmerg is a new multi-agent emergent communication framework designed to optimize networking in AI-native systems by enabling autonomous agents to develop task-specific communication protocols. The framework addresses bandwidth and computational constraints through intelligent message prioritization and complexity regularization, demonstrating significant performance improvements over existing solutions.

AIBullisharXiv – CS AI · May 96/10

🧠

Information Theoretic Adversarial Training of Large Language Models

Researchers propose WARDEN, an information-theoretic adversarial training framework that improves Large Language Model robustness against prompt attacks by dynamically reweighting adversarial examples using f-divergence principles. The method achieves comparable computational efficiency to existing approaches while substantially reducing attack success rates, advancing the scalability of AI safety mechanisms.

AINeutralarXiv – CS AI · May 76/10

🧠

Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation

Researchers reveal that large language models develop distinct hierarchical processing stages (Local, Intermediate, Global) determined by architecture family rather than model size. Using information theory, they demonstrate that Llama and Qwen models show dramatically different brittleness patterns across layers, with architectural design — not scaling — as the primary driver of model behavior.

🧠 Llama

AINeutralarXiv – CS AI · May 16/10

🧠

Why Self-Supervised Encoders Want to Be Normal

Researchers develop a theoretical framework connecting Information Bottleneck principles to encoder-decoder learning through rate-distortion analysis, showing optimal representations form soft clusters on probability manifolds. The work introduces Sketched Isotropic Gaussian Regularization (SIGReg) as a principled regularizer for self-supervised, semi-supervised, and supervised learning without requiring variational bounds.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Tail-Aware Information-Theoretic Generalization for RLHF and SGLD

Researchers develop a new information-theoretic framework that handles heavy-tailed data distributions, addressing limitations in classical generalization bounds used in machine learning. The work applies specifically to reinforcement learning from human feedback (RLHF) and stochastic gradient optimization, where traditional KL-divergence tools fail due to non-existent moment generating functions.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models

Researchers reveal that unified multimodal models (UMMs) combining language and vision capabilities fail to achieve genuine synergy, exhibiting divergent information patterns that undermine reasoning transfer to image synthesis. An information-theoretic framework analyzing ten models shows pseudo-unification stems from asymmetric encoding and conflicting response patterns, with only models implementing contextual prediction achieving stronger text-to-image reasoning.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Understanding Generalization in Role-Playing Models via Information Theory

Researchers introduce R-EMID, an information-theoretic metric to diagnose how distribution shifts degrade role-playing model performance in real-world deployments. The framework reveals that user shifts pose the greatest generalization risk, while co-evolving reinforcement learning provides the most effective mitigation strategy.

Page 1 of 2Next →