y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#generalization News & Analysis

46 articles tagged with #generalization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

46 articles
AIBullisharXiv โ€“ CS AI ยท 1d ago7/10
๐Ÿง 

Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

Researchers propose a label-free self-supervised reinforcement learning framework that enables language models to follow complex multi-constraint instructions without external supervision. The approach derives reward signals directly from instructions and uses constraint decomposition strategies to address sparse reward challenges, demonstrating strong performance across both in-domain and out-of-domain instruction-following tasks.

AIBullisharXiv โ€“ CS AI ยท 1d ago7/10
๐Ÿง 

Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models

Researchers introduce Ariadne, a framework demonstrating that Reinforcement Learning with Verifiable Rewards (RLVR) expands spatial reasoning capabilities in Vision-Language Models beyond their base distribution. Testing on synthetic mazes and real-world navigation benchmarks shows the technique enables models to solve previously unsolvable problems, suggesting genuine capability expansion rather than sampling efficiency.

AIBullisharXiv โ€“ CS AI ยท 2d ago7/10
๐Ÿง 

Proximal Supervised Fine-Tuning

Researchers propose Proximal Supervised Fine-Tuning (PSFT), a new method that applies trust-region constraints from reinforcement learning to improve how foundation models adapt to new tasks. The technique maintains model capabilities while fine-tuning, outperforming standard supervised fine-tuning on out-of-domain generalization tasks.

AINeutralarXiv โ€“ CS AI ยท 6d ago7/10
๐Ÿง 

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Researchers challenge the conventional wisdom that supervised finetuning (SFT) merely memorizes while reinforcement learning generalizes. Their analysis reveals that reasoning SFT with chain-of-thought supervision can generalize across domains, but success depends critically on optimization duration, data quality, and base model strength, with generalization improvements coming at the cost of degraded safety performance.

AIBearisharXiv โ€“ CS AI ยท Apr 67/10
๐Ÿง 

Generalization Limits of Reinforcement Learning Alignment

Researchers discovered that reinforcement learning alignment techniques like RLHF have significant generalization limits, demonstrated through 'compound jailbreaks' that increased attack success rates from 14.3% to 71.4% on OpenAI's gpt-oss-20b model. The study provides empirical evidence that safety training doesn't generalize as broadly as model capabilities, highlighting critical vulnerabilities in current AI alignment approaches.

๐Ÿข OpenAI
AINeutralarXiv โ€“ CS AI ยท Mar 267/10
๐Ÿง 

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

Researchers propose a new symbolic-mechanistic approach to evaluate AI models that goes beyond accuracy metrics to detect whether models truly generalize or rely on shortcuts like memorization. Their method combines symbolic rules with mechanistic interpretability to reveal when models exploit patterns rather than learn genuine capabilities, demonstrated through NL-to-SQL tasks where a memorization model achieved 94% accuracy but failed true generalization tests.

AINeutralarXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

The ARC of Progress towards AGI: A Living Survey of Abstraction and Reasoning

A comprehensive survey of 82 AI approaches to the ARC-AGI benchmark reveals consistent 2-3x performance drops across all paradigms when moving from version 1 to 2, with human-level reasoning still far from reach. While costs have fallen dramatically (390x in one year), AI systems struggle with compositional generalization, achieving only 13% on ARC-AGI-3 compared to near-perfect human performance.

๐Ÿง  GPT-5๐Ÿง  Opus
AINeutralarXiv โ€“ CS AI ยท Mar 67/10
๐Ÿง 

On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

Researchers introduce Non-Classical Network (NCnet), a classical neural architecture that exhibits quantum-like statistical behaviors through gradient competitions between neurons. The study reveals that multi-task neural networks can develop non-local correlations without explicit communication, providing new insights into deep learning training dynamics.

AINeutralarXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Effective Sample Size and Generalization Bounds for Temporal Networks

Researchers propose a new evaluation methodology for temporal deep learning that controls for effective sample size rather than raw sequence length. Their analysis of Temporal Convolutional Networks on time series data shows that stronger temporal dependence can actually improve generalization when properly evaluated, contradicting results from standard evaluation methods.

AINeutralarXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective

New research reveals that difficult training examples, which are crucial for supervised learning, actually hurt performance in unsupervised contrastive learning. The study provides theoretical framework and empirical evidence showing that removing these difficult examples can improve downstream classification tasks.

AINeutralarXiv โ€“ CS AI ยท Mar 46/102
๐Ÿง 

The Malignant Tail: Spectral Segregation of Label Noise in Over-Parameterized Networks

Researchers identify the 'Malignant Tail' phenomenon where over-parameterized neural networks segregate signal from noise during training, leading to harmful overfitting. They demonstrate that Stochastic Gradient Descent pushes label noise into high-frequency orthogonal subspaces while preserving semantic features in low-rank subspaces, and propose Explicit Spectral Truncation as a post-hoc solution to recover optimal generalization.

AIBullisharXiv โ€“ CS AI ยท Mar 46/102
๐Ÿง 

How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference

Researchers developed a two-stage learning framework enabling robots to perform complex manipulation tasks like food peeling with over 90% success rates. The system combines force-aware imitation learning with human preference-based refinement, achieving strong generalization across different produce types using only 50-200 training examples.

AINeutralarXiv โ€“ CS AI ยท Mar 47/103
๐Ÿง 

Loss Barcode: A Topological Measure of Escapability in Loss Landscapes

Researchers developed a new topological measure called the 'TO-score' to analyze neural network loss landscapes and understand how gradient descent optimization escapes local minima. Their findings show that deeper and wider networks have fewer topological obstructions to learning, and there's a connection between loss barcode characteristics and generalization performance.

AIBullisharXiv โ€“ CS AI ยท Mar 47/103
๐Ÿง 

Self-Improving Loops for Visual Robotic Planning

Researchers developed SILVR, a self-improving system for visual robotic planning that uses video generative models to continuously enhance robot performance through self-collected data. The system demonstrates improved task performance across MetaWorld simulations and real robot manipulations without requiring human-provided rewards or expert demonstrations.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction

Researchers introduce PolySkill, a framework that enables AI agents to learn generalizable skills by separating abstract goals from concrete implementations, inspired by software engineering polymorphism. The method improves skill reuse by 1.7x and boosts success rates by up to 13.9% on web navigation tasks while reducing execution steps by over 20%.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks

Researchers propose that intrinsic task symmetries drive 'grokking' - the sudden transition from memorization to generalization in neural networks. The study identifies a three-stage training process and introduces diagnostic tools to predict and accelerate the onset of generalization in algorithmic reasoning tasks.

AINeutralarXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

Characterizing Pattern Matching and Its Limits on Compositional Task Structures

New research formally defines and analyzes pattern matching in large language models, revealing predictable limits in their ability to generalize on compositional tasks. The study provides mathematical boundaries for when pattern matching succeeds or fails, with implications for AI model development and understanding.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

MagicAgent: Towards Generalized Agent Planning

Researchers have developed MagicAgent, a series of foundation models designed for generalized AI agent planning that outperforms existing sub-100B models and even surpasses leading ultra-scale models like GPT-5.2. The models achieve superior performance through a novel synthetic data framework and two-stage training paradigm that addresses gradient interference in multi-task learning.

AIBullisharXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

EnterpriseBench Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

Surge AI introduces CoreCraft, the first environment in EnterpriseBench for training AI agents on realistic enterprise workflows. Training GLM 4.6 on this high-fidelity customer support simulation improved task performance from 25% to 37% and showed positive transfer to other benchmarks, demonstrating that quality training environments enable generalizable AI capabilities.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

Learning Robust Intervention Representations with Delta Embeddings

Researchers propose Causal Delta Embeddings, a new method for learning robust AI representations from image pairs that improves out-of-distribution performance. The approach focuses on representing interventions in causal models rather than just scene variables, achieving significant improvements in synthetic and real-world benchmarks without additional supervision.

AIBullisharXiv โ€“ CS AI ยท Feb 277/106
๐Ÿง 

Rethinking Cross-Modal Fine-Tuning: Optimizing the Interaction between Feature Alignment and Target Fitting

Researchers developed a theoretical framework to optimize cross-modal fine-tuning of pre-trained AI models, addressing the challenge of aligning new feature modalities with existing representation spaces. The approach introduces a novel concept of feature-label distortion and demonstrates improved performance over state-of-the-art methods across benchmark datasets.

AIBullishLast Week in AI ยท Dec 177/10
๐Ÿง 

LWiAI Podcast #228 - GPT 5.2, Scaling Agents, Weird Generalization

OpenAI has released GPT-5.2 as part of the competitive landscape in agentic AI development. The podcast episode discusses advances in scaling agent systems and explores unusual generalization behaviors in AI models.

LWiAI Podcast #228 - GPT 5.2, Scaling Agents, Weird Generalization
๐Ÿข OpenAI๐Ÿง  GPT-5
AINeutralarXiv โ€“ CS AI ยท 2d ago6/10
๐Ÿง 

A Survey of Inductive Reasoning for Large Language Models

Researchers present the first comprehensive survey of inductive reasoning in large language models, categorizing improvement methods into post-training, test-time scaling, and data augmentation approaches. The survey establishes unified benchmarks and evaluation metrics for assessing how LLMs perform particular-to-general reasoning tasks that better align with human cognition.

AINeutralarXiv โ€“ CS AI ยท 2d ago6/10
๐Ÿง 

Understanding Generalization in Role-Playing Models via Information Theory

Researchers introduce R-EMID, an information-theoretic metric to diagnose how distribution shifts degrade role-playing model performance in real-world deployments. The framework reveals that user shifts pose the greatest generalization risk, while co-evolving reinforcement learning provides the most effective mitigation strategy.

Page 1 of 2Next โ†’