#supervised-learning News & Analysis

17 articles tagged with #supervised-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

Provable Benefits of RLVR over SFT for Reasoning Models: Learning to Backtrack Efficiently

Researchers prove theoretically that reinforcement learning with verifiable rewards (RLVR) enables language models to learn efficient backtracking strategies superior to supervised fine-tuning (SFT), achieving exponential computational advantages during inference. The study models chain-of-thought reasoning as graph pathfinding and demonstrates that RLVR trains models to identify difficult decision points, allowing better allocation of compute resources.

AINeutralarXiv – CS AI · Jun 27/10

🧠

A Fiber Criterion for Representation Identifiability in Supervised Learning

A new theoretical framework formalizes when representation properties in supervised learning can be uniquely identified from input-output behavior alone. The research demonstrates that representation-level claims require additional assumptions beyond predictive performance, as auxiliary information can be added to representations while preserving predictor outputs, fundamentally challenging common assumptions about what supervised learning actually determines.

AIBullisharXiv – CS AI · Mar 56/10

🧠

R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

Researchers developed R1-Code-Interpreter, a large language model that uses multi-stage reinforcement learning to autonomously generate code for step-by-step reasoning across diverse tasks. The 14B parameter model achieves 72.4% accuracy on test tasks, outperforming GPT-4o variants and demonstrating emergent self-checking capabilities through code generation.

🏢 Hugging Face🧠 GPT-4

AIBullisharXiv – CS AI · Mar 57/10

🧠

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

Researchers propose Supervised Calibration (SC), a new framework to improve In-Context Learning performance in Large Language Models by addressing systematic biases through optimal affine transformations in logit space. The method achieves state-of-the-art results across multiple LLMs including Mistral-7B, Llama-2-7B, and Qwen2-7B in few-shot learning scenarios.

🧠 Llama

AIBearisharXiv – CS AI · Mar 47/102

🧠

Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Researchers discovered a new stealth poisoning attack method targeting medical AI language models during fine-tuning that degrades performance on specific medical topics without detection. The attack injects poisoned rationales into training data, proving more effective than traditional backdoor attacks or catastrophic forgetting methods.

AINeutralarXiv – CS AI · Jun 236/10

🧠

The New Associationism: Lessons from Deep Learning

A new academic paper argues that modern deep learning systems validate associationist theories of human learning, showing that supervised learning with evaluative feedback underlies diverse AI systems from language models to game-playing agents. While this vindicates classical associationist principles of uniform, gradual error-driven learning, the paper emphasizes that contemporary AI success depends on computational architectures far beyond what classical associationists imagined.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Scaling Performance and Low-Resource Annotation with Many-Shot In-Context Learning for Named Entity Recognition

Researchers demonstrate that large language models can match or exceed fine-tuned BERT performance on Named Entity Recognition tasks when provided with hundreds of in-context examples rather than just a few. The study shows many-shot in-context learning can also serve as a data annotation framework, generating high-quality training data that improves low-resource NER by ~10% F1 when used to fine-tune supervised models.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Post-training is (Massive) Supervised Learning

A new arXiv paper argues that current LLM post-training methods (SFT and RL) function primarily as distribution-fitting mechanisms rather than developing general capabilities, reverting to pre-BERT era approaches. The authors demonstrate that randomly initialized models achieve non-trivial performance when fine-tuned on modern benchmarks, suggesting the field should shift toward training systems that learn how to learn rather than optimizing for specific tasks.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Larch: Learned Query Optimization for Semantic Predicates

Larch is a new optimization framework that improves the efficiency of semantic SQL queries by reducing token usage and computational costs when processing unstructured data with Large Language Models. The framework uses two approaches—reinforcement learning and supervised learning—to optimize the order of filter evaluation, achieving 3x-19x token cost reductions compared to existing solutions.

AINeutralarXiv – CS AI · Jun 85/10

🧠

Supervision versus Demonstration-Based In-Context Learning for Multiword Expression Classification

Researchers compared supervised learning and large language model prompting approaches for detecting Turkish idiomatic light verb constructions, finding that while zero-shot LLMs struggle with recall, few-shot demonstrations significantly improve performance. The study reveals that careful prompt engineering can match or exceed traditional supervised baselines, though results remain highly model-sensitive.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Bayes-Sufficient Representations in Supervised Learning

A new theoretical framework defines Bayes-sufficient representations in supervised learning, establishing what information is genuinely required for optimal predictions based on loss functions. The work formalizes the concept of Bayes quotients and minimal representations, connecting representation learning to property elicitation theory with experimental validation across synthetic and real datasets.

AINeutralarXiv – CS AI · May 296/10

🧠

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

Researchers propose EKSFT, a novel fine-tuning method that selectively masks high-entropy and high-KL divergence tokens during supervised fine-tuning of large language models. The approach aims to preserve pre-trained model distributions while efficiently activating task-relevant capabilities in low-data regimes, demonstrating improved performance on mathematical reasoning benchmarks.

AINeutralarXiv – CS AI · May 296/10

🧠

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Researchers demonstrate that reinforcement learning (RL) preserves internal computational circuits in large language models better than supervised fine-tuning (SFT) during task adaptation. Using a new metric called differential circuit vulnerability on Qwen2.5-3B-Instruct, they reveal a mechanistic trade-off: SFT adapts faster but causes substantial circuit disruption and capability forgetting, while RL maintains base model circuits at the cost of slower learning.

AINeutralarXiv – CS AI · May 296/10

🧠

Test Time Training for Supervised Causal Learning

Researchers propose Test-Time Training for Supervised Causal Learning (TTT-SCL), a framework addressing critical limitations in causal discovery by generating test-specific training sets. The approach significantly improves performance gaps between synthetic benchmarks and real-world applications while enhancing robustness to distribution shifts.

AINeutralarXiv – CS AI · May 285/10

🧠

Supervised Distributional Reduction via Optimal Transport and Dependence Maximization

Researchers propose Supervised Distributional Reduction (SDR), a machine learning algorithm combining optimal transport theory with dependence maximization to create compact data representations that preserve both geometric structure and predictive information. The method extends the Fused Gromov-Wasserstein framework and offers applications in representation learning and adaptive kernel design for Gaussian Process modeling.

AINeutralarXiv – CS AI · May 46/10

🧠

MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

MemRouter is a new memory management system for conversational AI agents that uses lightweight embedding-based routing instead of expensive LLM generation to decide which conversation turns to store. The approach achieves 52.0 F1 score versus 45.6 for LLM-based alternatives while reducing latency from 970ms to 58ms, suggesting memory admission can be effectively learned through supervised classification rather than generative models.

AIBullisharXiv – CS AI · Mar 24/106

🧠

Asymptotically Stable Quaternion-valued Hopfield-structured Neural Network with Periodic Projection-based Supervised Learning Rules

Researchers propose a quaternion-valued supervised learning Hopfield neural network (QSHNN) that leverages quaternions' geometric advantages for representing rotations and postures. The model introduces periodic projection-based learning rules to maintain quaternionic consistency while achieving high accuracy and fast convergence, with potential applications in robotics and control systems.