y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#generalization News & Analysis

77 articles tagged with #generalization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

77 articles
AIBearisharXiv – CS AI · 2d ago7/10
🧠

Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts

Researchers benchmarked five physics foundation models across 8 physical dynamics and 25 test regimes, revealing that current models function as conditional rather than universal generalists. The study demonstrates that model performance heavily depends on physical regime, temporal scale, and distribution shifts, with pretraining and scaling unable to reliably overcome these limitations.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

Quantifying and Optimizing Simplicity via Polynomial Representations

Researchers introduce polynomial representations as a quantitative measure of neural network simplicity, demonstrating that the effective degree of these representations predicts generalization better than existing metrics. The approach yields a differentiable regularizer that improves performance across image classification, text tasks, vision-language models, and reinforcement learning.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models

Researchers introduce VLA-Pro, a framework that enhances vision-language-action models for robotics by storing and retrieving task-specific procedural memories during inference. The approach achieves dramatic performance gains—up to 207% improvement in simulation and raising real-world success rates from 5.8% to 65%—demonstrating significant progress in cross-task generalization for robotic manipulation.

AIBullisharXiv – CS AI · May 127/10
🧠

Latent Personality Alignment: Improving Harmlessness Without Mentioning Harms

Researchers propose Latent Personality Alignment (LPA), a novel defense mechanism for large language models that achieves adversarial robustness by training on abstract personality traits rather than harmful examples. The method requires fewer than 100 training examples while matching the performance of traditional approaches using 150,000+ harmful prompts, and demonstrates superior generalization to unseen attack vectors.

AIBearisharXiv – CS AI · May 127/10
🧠

Benchmarking Compositional Generalisation for Machine Learning Interatomic Potentials

Researchers have created a benchmark to test whether machine learning interatomic potentials can generalize to unseen molecules by learning underlying chemical principles. The study reveals that state-of-the-art models, including foundation models trained on millions of molecules, fail significantly on out-of-distribution examples, with errors often 10x higher than on training data.

AIBullisharXiv – CS AI · May 117/10
🧠

Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents

Researchers introduce HCL-GP, a machine learning approach that enables large language model agents to learn and reuse hierarchical task decompositions for improved performance on complex applications. The method achieves 98.2% accuracy on standard tasks and demonstrates significant improvements over static synthesis approaches, particularly benefiting open-source models through dynamic component reuse.

AINeutralarXiv – CS AI · May 117/10
🧠

Does Your Neural Network Extrapolate? Feature Engineering as Identifiability Bias for OOD Generalization

Researchers demonstrate that neural networks fail at out-of-distribution (OOD) generalization not due to insufficient training data, but because the choice of feature representation fundamentally determines what extrapolation patterns a model can learn. The same architecture achieving identical in-distribution loss can differ by 520x out-of-distribution depending on how features are encoded, showing that correct feature engineering is necessary but not sufficient without appropriate model class constraints.

AINeutralarXiv – CS AI · May 97/10
🧠

Are Flat Minima an Illusion?

A research paper challenges the prevailing assumption that flat minima in neural network loss landscapes improve generalization, arguing instead that 'weakness'—the volume of function-compatible parameter configurations—is the true driver of generalization. The author demonstrates that flatness is reparameterization-dependent and thus not causally responsible for better performance, while weakness remains invariant across different parameterizations.

AIBullisharXiv – CS AI · May 97/10
🧠

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

Researchers propose ADAPT, an online data reweighting framework that dynamically adjusts training sample importance during LLM training rather than using static offline selection methods. This approach maintains data diversity while improving generalization, outperforming existing offline curation techniques on instruction tuning and large-scale pretraining tasks.

AIBullisharXiv – CS AI · May 47/10
🧠

Preference Goal Tuning: Post-Training as Latent Control for Frozen Policies

Researchers introduce Preference Goal Tuning (PGT), a novel post-training framework that optimizes goal embeddings as continuous control variables rather than updating frozen policy parameters. Testing on Minecraft SkillForge demonstrates PGT achieves 72-81% relative improvements over expert-crafted prompts while showing superior generalization in out-of-distribution settings compared to traditional fine-tuning.

AIBullisharXiv – CS AI · Apr 157/10
🧠

Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

Researchers propose a label-free self-supervised reinforcement learning framework that enables language models to follow complex multi-constraint instructions without external supervision. The approach derives reward signals directly from instructions and uses constraint decomposition strategies to address sparse reward challenges, demonstrating strong performance across both in-domain and out-of-domain instruction-following tasks.

AIBullisharXiv – CS AI · Apr 157/10
🧠

Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models

Researchers introduce Ariadne, a framework demonstrating that Reinforcement Learning with Verifiable Rewards (RLVR) expands spatial reasoning capabilities in Vision-Language Models beyond their base distribution. Testing on synthetic mazes and real-world navigation benchmarks shows the technique enables models to solve previously unsolvable problems, suggesting genuine capability expansion rather than sampling efficiency.

AIBullisharXiv – CS AI · Apr 147/10
🧠

Proximal Supervised Fine-Tuning

Researchers propose Proximal Supervised Fine-Tuning (PSFT), a new method that applies trust-region constraints from reinforcement learning to improve how foundation models adapt to new tasks. The technique maintains model capabilities while fine-tuning, outperforming standard supervised fine-tuning on out-of-domain generalization tasks.

AINeutralarXiv – CS AI · Apr 107/10
🧠

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Researchers challenge the conventional wisdom that supervised finetuning (SFT) merely memorizes while reinforcement learning generalizes. Their analysis reveals that reasoning SFT with chain-of-thought supervision can generalize across domains, but success depends critically on optimization duration, data quality, and base model strength, with generalization improvements coming at the cost of degraded safety performance.

AIBearisharXiv – CS AI · Apr 67/10
🧠

Generalization Limits of Reinforcement Learning Alignment

Researchers discovered that reinforcement learning alignment techniques like RLHF have significant generalization limits, demonstrated through 'compound jailbreaks' that increased attack success rates from 14.3% to 71.4% on OpenAI's gpt-oss-20b model. The study provides empirical evidence that safety training doesn't generalize as broadly as model capabilities, highlighting critical vulnerabilities in current AI alignment approaches.

🏢 OpenAI
AINeutralarXiv – CS AI · Mar 267/10
🧠

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

Researchers propose a new symbolic-mechanistic approach to evaluate AI models that goes beyond accuracy metrics to detect whether models truly generalize or rely on shortcuts like memorization. Their method combines symbolic rules with mechanistic interpretability to reveal when models exploit patterns rather than learn genuine capabilities, demonstrated through NL-to-SQL tasks where a memorization model achieved 94% accuracy but failed true generalization tests.

AINeutralarXiv – CS AI · Mar 177/10
🧠

The ARC of Progress towards AGI: A Living Survey of Abstraction and Reasoning

A comprehensive survey of 82 AI approaches to the ARC-AGI benchmark reveals consistent 2-3x performance drops across all paradigms when moving from version 1 to 2, with human-level reasoning still far from reach. While costs have fallen dramatically (390x in one year), AI systems struggle with compositional generalization, achieving only 13% on ARC-AGI-3 compared to near-perfect human performance.

🧠 GPT-5🧠 Opus
AINeutralarXiv – CS AI · Mar 67/10
🧠

On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

Researchers introduce Non-Classical Network (NCnet), a classical neural architecture that exhibits quantum-like statistical behaviors through gradient competitions between neurons. The study reveals that multi-task neural networks can develop non-local correlations without explicit communication, providing new insights into deep learning training dynamics.

AINeutralarXiv – CS AI · Mar 57/10
🧠

Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective

New research reveals that difficult training examples, which are crucial for supervised learning, actually hurt performance in unsupervised contrastive learning. The study provides theoretical framework and empirical evidence showing that removing these difficult examples can improve downstream classification tasks.

AINeutralarXiv – CS AI · Mar 57/10
🧠

Effective Sample Size and Generalization Bounds for Temporal Networks

Researchers propose a new evaluation methodology for temporal deep learning that controls for effective sample size rather than raw sequence length. Their analysis of Temporal Convolutional Networks on time series data shows that stronger temporal dependence can actually improve generalization when properly evaluated, contradicting results from standard evaluation methods.

AINeutralarXiv – CS AI · Mar 46/102
🧠

The Malignant Tail: Spectral Segregation of Label Noise in Over-Parameterized Networks

Researchers identify the 'Malignant Tail' phenomenon where over-parameterized neural networks segregate signal from noise during training, leading to harmful overfitting. They demonstrate that Stochastic Gradient Descent pushes label noise into high-frequency orthogonal subspaces while preserving semantic features in low-rank subspaces, and propose Explicit Spectral Truncation as a post-hoc solution to recover optimal generalization.

AIBullisharXiv – CS AI · Mar 46/102
🧠

How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference

Researchers developed a two-stage learning framework enabling robots to perform complex manipulation tasks like food peeling with over 90% success rates. The system combines force-aware imitation learning with human preference-based refinement, achieving strong generalization across different produce types using only 50-200 training examples.

AINeutralarXiv – CS AI · Mar 47/103
🧠

Loss Barcode: A Topological Measure of Escapability in Loss Landscapes

Researchers developed a new topological measure called the 'TO-score' to analyze neural network loss landscapes and understand how gradient descent optimization escapes local minima. Their findings show that deeper and wider networks have fewer topological obstructions to learning, and there's a connection between loss barcode characteristics and generalization performance.

AIBullisharXiv – CS AI · Mar 47/103
🧠

Self-Improving Loops for Visual Robotic Planning

Researchers developed SILVR, a self-improving system for visual robotic planning that uses video generative models to continuously enhance robot performance through self-collected data. The system demonstrates improved task performance across MetaWorld simulations and real robot manipulations without requiring human-provided rewards or expert demonstrations.

Page 1 of 4Next →