AINeutralarXiv – CS AI · May 287/10
🧠Researchers prove that large language models fundamentally cannot perform causal discovery through standard training methods, establishing this limitation as intrinsic to supervised learning rather than a model-specific flaw. They propose Agentic Causal Bayesian Optimization (A-CBO), which bypasses this constraint by using frozen language models as query oracles within an external optimization loop, achieving superior performance on causal inference benchmarks.
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers introduce DARKFormer, a new transformer architecture that reduces computational complexity from quadratic to linear while maintaining performance. The model uses data-aware random feature kernels to address efficiency issues in pretrained transformer models with anisotropic query-key distributions.
AINeutralarXiv – CS AI · May 285/10
🧠Researchers propose Supervised Distributional Reduction (SDR), a machine learning algorithm combining optimal transport theory with dependence maximization to create compact data representations that preserve both geometric structure and predictive information. The method extends the Fused Gromov-Wasserstein framework and offers applications in representation learning and adaptive kernel design for Gaussian Process modeling.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers demonstrate that standard transformer models with softmax attention can implement preconditioned Richardson iteration to solve Gaussian kernel ridge regression tasks during in-context learning. The theoretical construction and empirical validation reveal how transformers decompose nonlinear prediction into interpretable algorithmic steps, advancing mechanistic understanding of transformer capabilities.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose a non-linear transformer architecture that enables reinforcement learning agents to generalize across different domains through in-context learning, establishing a theoretical connection between transformers and kernel-based temporal difference learning. By interpreting transformers as operators in Reproducing Kernel Hilbert Space, the work demonstrates that value functions from diverse domains can share a unified weight set, with MetaWorld experiments validating the approach.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce CLP-DD, a novel dataset distillation method optimized for frozen pre-trained vision models using closed-form linear probing. The technique achieves comparable or superior performance to existing methods while running 14x faster and using 87.5% less GPU memory on ImageNet-1K.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers introduce PKeX-Shapley, an algorithm that computes exact Shapley values for product-kernel machine learning models in quadratic time, eliminating the need for approximations. The method exploits the multiplicative structure of product kernels to achieve linear-time-per-feature attribution without sampling or density estimation, extending beyond predictive models to statistical discrepancy measures like MMD and HSIC.
AIBullisharXiv – CS AI · Apr 206/10
🧠Researchers introduce Transformer Neural Process - Kernel Regression (TNP-KR), a scalable machine learning architecture that dramatically reduces computational complexity for neural processes from O(n²) to O(n_c) while maintaining or exceeding accuracy. The breakthrough enables processing of 100K context points with 1M+ test points on a single GPU, advancing the feasibility of neural processes for large-scale applications.
AINeutralarXiv – CS AI · Mar 44/103
🧠Researchers propose a new Personalized Federated Learning approach that automatically learns optimal collaboration weights between agents without prior knowledge of data heterogeneity. The method uses kernel mean embedding estimation to capture statistical relationships between agents and includes a practical implementation for communication-constrained federated settings.