AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers developed R1-Code-Interpreter, a large language model that uses multi-stage reinforcement learning to autonomously generate code for step-by-step reasoning across diverse tasks. The 14B parameter model achieves 72.4% accuracy on test tasks, outperforming GPT-4o variants and demonstrating emergent self-checking capabilities through code generation.
🏢 Hugging Face🧠 GPT-4
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers propose Supervised Calibration (SC), a new framework to improve In-Context Learning performance in Large Language Models by addressing systematic biases through optimal affine transformations in logit space. The method achieves state-of-the-art results across multiple LLMs including Mistral-7B, Llama-2-7B, and Qwen2-7B in few-shot learning scenarios.
🧠 Llama
AIBearisharXiv – CS AI · Mar 47/102
🧠Researchers discovered a new stealth poisoning attack method targeting medical AI language models during fine-tuning that degrades performance on specific medical topics without detection. The attack injects poisoned rationales into training data, proving more effective than traditional backdoor attacks or catastrophic forgetting methods.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers propose EKSFT, a novel fine-tuning method that selectively masks high-entropy and high-KL divergence tokens during supervised fine-tuning of large language models. The approach aims to preserve pre-trained model distributions while efficiently activating task-relevant capabilities in low-data regimes, demonstrating improved performance on mathematical reasoning benchmarks.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers demonstrate that reinforcement learning (RL) preserves internal computational circuits in large language models better than supervised fine-tuning (SFT) during task adaptation. Using a new metric called differential circuit vulnerability on Qwen2.5-3B-Instruct, they reveal a mechanistic trade-off: SFT adapts faster but causes substantial circuit disruption and capability forgetting, while RL maintains base model circuits at the cost of slower learning.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers propose Test-Time Training for Supervised Causal Learning (TTT-SCL), a framework addressing critical limitations in causal discovery by generating test-specific training sets. The approach significantly improves performance gaps between synthetic benchmarks and real-world applications while enhancing robustness to distribution shifts.
AINeutralarXiv – CS AI · 4d ago5/10
🧠Researchers propose Supervised Distributional Reduction (SDR), a machine learning algorithm combining optimal transport theory with dependence maximization to create compact data representations that preserve both geometric structure and predictive information. The method extends the Fused Gromov-Wasserstein framework and offers applications in representation learning and adaptive kernel design for Gaussian Process modeling.
AINeutralarXiv – CS AI · May 46/10
🧠MemRouter is a new memory management system for conversational AI agents that uses lightweight embedding-based routing instead of expensive LLM generation to decide which conversation turns to store. The approach achieves 52.0 F1 score versus 45.6 for LLM-based alternatives while reducing latency from 970ms to 58ms, suggesting memory admission can be effectively learned through supervised classification rather than generative models.
AIBullisharXiv – CS AI · Mar 24/106
🧠Researchers propose a quaternion-valued supervised learning Hopfield neural network (QSHNN) that leverages quaternions' geometric advantages for representing rotations and postures. The model introduces periodic projection-based learning rules to maintain quaternionic consistency while achieving high accuracy and fast convergence, with potential applications in robotics and control systems.