y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#kernel-methods News & Analysis

9 articles tagged with #kernel-methods. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

9 articles
AINeutralarXiv – CS AI · May 287/10
🧠

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

Researchers prove that large language models fundamentally cannot perform causal discovery through standard training methods, establishing this limitation as intrinsic to supervised learning rather than a model-specific flaw. They propose Agentic Causal Bayesian Optimization (A-CBO), which bypasses this constraint by using frozen language models as query oracles within an external optimization loop, achieving superior performance on causal inference benchmarks.

AIBullisharXiv – CS AI · Mar 56/10
🧠

Data-Aware Random Feature Kernel for Transformers

Researchers introduce DARKFormer, a new transformer architecture that reduces computational complexity from quadratic to linear while maintaining performance. The model uses data-aware random feature kernels to address efficiency issues in pretrained transformer models with anisotropic query-key distributions.

AINeutralarXiv – CS AI · May 285/10
🧠

Supervised Distributional Reduction via Optimal Transport and Dependence Maximization

Researchers propose Supervised Distributional Reduction (SDR), a machine learning algorithm combining optimal transport theory with dependence maximization to create compact data representations that preserve both geometric structure and predictive information. The method extends the Fused Gromov-Wasserstein framework and offers applications in representation learning and adaptive kernel design for Gaussian Process modeling.

AINeutralarXiv – CS AI · May 126/10
🧠

Transformers Can Implement Preconditioned Richardson Iteration for In-Context Gaussian Kernel Regression

Researchers demonstrate that standard transformer models with softmax attention can implement preconditioned Richardson iteration to solve Gaussian kernel ridge regression tasks during in-context learning. The theoretical construction and empirical validation reveal how transformers decompose nonlinear prediction into interpretable algorithmic steps, advancing mechanistic understanding of transformer capabilities.

AINeutralarXiv – CS AI · May 126/10
🧠

One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning

Researchers propose a non-linear transformer architecture that enables reinforcement learning agents to generalize across different domains through in-context learning, establishing a theoretical connection between transformers and kernel-based temporal difference learning. By interpreting transformers as operators in Reproducing Kernel Hilbert Space, the work demonstrates that value functions from diverse domains can share a unified weight set, with MetaWorld experiments validating the approach.

AINeutralarXiv – CS AI · May 116/10
🧠

Closed-Form Linear-Probe Dataset Distillation for Pre-trained Vision Models

Researchers introduce CLP-DD, a novel dataset distillation method optimized for frozen pre-trained vision models using closed-form linear probing. The technique achieves comparable or superior performance to existing methods while running 14x faster and using 87.5% less GPU memory on ImageNet-1K.

AINeutralarXiv – CS AI · May 96/10
🧠

Amortized Linear-time Exact Shapley Value for Product-Kernel Methods

Researchers introduce PKeX-Shapley, an algorithm that computes exact Shapley values for product-kernel machine learning models in quadratic time, eliminating the need for approximations. The method exploits the multiplicative structure of product kernels to achieve linear-time-per-feature attribution without sampling or density estimation, extending beyond predictive models to statistical discrepancy measures like MMD and HSIC.

AIBullisharXiv – CS AI · Apr 206/10
🧠

Transformer Neural Processes - Kernel Regression

Researchers introduce Transformer Neural Process - Kernel Regression (TNP-KR), a scalable machine learning architecture that dramatically reduces computational complexity for neural processes from O(n²) to O(n_c) while maintaining or exceeding accuracy. The breakthrough enables processing of 100K context points with 1M+ test points on a single GPU, advancing the feasibility of neural processes for large-scale applications.

AINeutralarXiv – CS AI · Mar 44/103
🧠

Adaptive Personalized Federated Learning via Multi-task Averaging of Kernel Mean Embeddings

Researchers propose a new Personalized Federated Learning approach that automatically learns optimal collaboration weights between agents without prior knowledge of data heterogeneity. The method uses kernel mean embedding estimation to capture statistical relationships between agents and includes a practical implementation for communication-constrained federated settings.