76 articles tagged with #model-training. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv β CS AI Β· Apr 77/10
π§ Researchers found that large language models align with human brain activity during creative thinking tasks, with alignment increasing based on model size and idea originality. Different post-training approaches selectively reshape how LLMs align with creative versus analytical neural patterns in humans.
π§ Llama
AIBullisharXiv β CS AI Β· Apr 77/10
π§ Researchers propose Continuous Softened Retracing reSampling (CSRS) to improve the self-evolution of Multimodal Large Language Models by addressing biases in feedback mechanisms. The method uses continuous reward signals instead of binary rewards and achieves state-of-the-art results on mathematical reasoning benchmarks like MathVision using Qwen2.5-VL-7B.
AINeutralarXiv β CS AI Β· Apr 67/10
π§ Researchers developed Debiasing-DPO, a new training method that reduces harmful biases in large language models by 84% while improving accuracy by 52%. The study found that LLMs can shift predictions by up to 1.48 points when exposed to irrelevant contextual information like demographics, highlighting critical risks for high-stakes AI applications.
π§ Llama
AIBullisharXiv β CS AI Β· Mar 267/10
π§ Researchers demonstrate that PLDR-LLMs trained at self-organized criticality exhibit enhanced reasoning capabilities at inference time. The study shows that reasoning ability can be quantified using an order parameter derived from global model statistics, with models performing better when this parameter approaches zero at criticality.
AINeutralarXiv β CS AI Β· Mar 267/10
π§ Researchers have developed techniques to mitigate many-shot jailbreaking (MSJ) attacks on large language models, where attackers use numerous examples to override safety training. Combined fine-tuning and input sanitization approaches significantly reduce MSJ effectiveness while maintaining normal model performance.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers identified that repetitive safety training data causes large language models to develop false refusals, where benign queries are incorrectly declined. They developed FlowLens, a PCA-based analysis tool, and proposed Variance Concentration Loss (VCL) as a regularization technique that reduces false refusals by over 35 percentage points while maintaining performance.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers identify a fundamental flaw in large language models called 'Rung Collapse' where AI systems achieve correct answers through flawed causal reasoning that fails under distribution shifts. They propose Epistemic Regret Minimization (ERM) as a solution that penalizes incorrect reasoning processes independently of task success, showing 53-59% recovery of reasoning errors in experiments across six frontier LLMs.
π§ GPT-5
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers have developed a novel method to enhance large language model reasoning capabilities using supervision from weaker models, achieving 94% of expensive reinforcement learning gains at a fraction of the cost. This weak-to-strong supervision paradigm offers a promising alternative to costly traditional methods for improving LLM reasoning performance.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers discovered that AI language models hallucinate not from failing to detect uncertainty, but from inability to integrate uncertainty signals into output generation. The study shows models can identify uncertain inputs internally, but these signals become geometrically amplified yet functionally silent due to weak coupling with output layers.
AINeutralarXiv β CS AI Β· Mar 167/10
π§ Researchers propose the Superficial Safety Alignment Hypothesis (SSAH), suggesting that AI safety alignment in large language models can be understood as a binary classification task of fulfilling or refusing user requests. The study identifies four types of critical components at the neuron level that establish safety guardrails, enabling models to retain safety attributes while adapting to new tasks.
AIBullisharXiv β CS AI Β· Mar 127/10
π§ Researchers propose ROVA, a new training framework that improves vision-language models' robustness in real-world conditions by up to 24% accuracy gains. The framework addresses performance degradation from weather, occlusion, and camera motion that can cause up to 35% accuracy drops in current models.
AIBullisharXiv β CS AI Β· Mar 97/10
π§ Researchers propose a three-stage pipeline to train Large Language Models to efficiently provide calibrated uncertainty estimates for their responses. The method uses entropy-based scoring, Platt scaling calibration, and reinforcement learning to enable models to reason about uncertainty without computationally expensive post-hoc methods.
AIBullisharXiv β CS AI Β· Mar 97/10
π§ Researchers propose FLoRG, a new federated learning framework for efficiently fine-tuning large language models that reduces communication overhead by up to 2041x while improving accuracy. The method uses Gram matrix aggregation and Procrustes alignment to solve aggregation errors and decomposition drift issues in distributed AI training.
AIBullisharXiv β CS AI Β· Mar 66/10
π§ Researchers propose VISA (Value Injection via Shielded Adaptation), a new framework for aligning Large Language Models with human values while avoiding the 'alignment tax' that causes knowledge drift and hallucinations. The system uses a closed-loop architecture with value detection, translation, and rewriting components, demonstrating superior performance over standard fine-tuning methods and GPT-4o in maintaining factual consistency.
π§ GPT-4
AIBullisharXiv β CS AI Β· Mar 57/10
π§ Researchers developed a new AI training method using knowledge graphs as reward models to improve compositional reasoning in specialized domains. The approach enables smaller 14B parameter models to outperform much larger frontier systems like GPT-5.2 and Gemini 3 Pro on complex multi-hop reasoning tasks in medicine.
π§ Gemini
AIBearisharXiv β CS AI Β· Mar 56/10
π§ Researchers have identified 'preference leakage,' a contamination problem in LLM-as-a-judge systems where evaluator models show bias toward related data generator models. The study found this bias occurs when judge and generator LLMs share relationships like being the same model, having inheritance connections, or belonging to the same model family.
AIBullisharXiv β CS AI Β· Mar 57/10
π§ Researchers introduce Visual Attention Score (VAS) to analyze multimodal reasoning models, discovering that higher visual attention correlates strongly with better performance (r=0.9616). They propose AVAR framework that achieves 7% performance gains on Qwen2.5-VL-7B across multimodal reasoning benchmarks.
AIBullisharXiv β CS AI Β· Mar 57/10
π§ Researchers introduce DCR (Discernment via Contrastive Refinement), a new method to reduce over-refusal in safety-aligned large language models. The approach helps LLMs better distinguish between genuinely toxic and seemingly toxic prompts, maintaining safety while improving helpfulness without degrading general capabilities.
AINeutralarXiv β CS AI Β· Mar 56/10
π§ Researchers have identified Order-to-Space Bias (OTS) in modern image generation models, where the order entities are mentioned in text prompts incorrectly determines spatial layout and role assignments. The study introduces OTS-Bench to measure this bias and demonstrates that targeted fine-tuning and early-stage interventions can reduce the problem while maintaining generation quality.
AIBullisharXiv β CS AI Β· Mar 47/103
π§ Researchers introduce Paramβ, a novel method for transferring post-training capabilities to updated language models without additional training costs. The technique achieves 95% performance of traditional post-training by computing weight differences between base and post-trained models, offering significant cost savings for AI model development.
AIBearisharXiv β CS AI Β· Mar 46/103
π§ New research reveals that current large language models struggle with collaborative reasoning, showing that 'stronger' models are often more fragile when distracted by misleading information. The study of 15 LLMs found they fail to effectively leverage guidance from other models, with success rates below 9.2% on challenging problems.
AIBullisharXiv β CS AI Β· Mar 47/103
π§ Researchers introduce Skywork-Reward-V2, a suite of AI reward models trained on SynPref-40M, a massive 40-million preference pair dataset created through human-AI collaboration. The models achieve state-of-the-art performance across seven major benchmarks by combining human annotation quality with AI scalability for better preference learning.
AINeutralarXiv β CS AI Β· Mar 37/105
π§ Researchers identified that fine-tuning non-robust pretrained AI models with robust objectives can lead to poor performance, termed 'suboptimal transfer.' They propose Epsilon-Scheduling, a novel training technique that adjusts perturbation strength during training to improve both task adaptation and adversarial robustness.
AIBearisharXiv β CS AI Β· Mar 37/103
π§ New research reveals that benchmark contamination in language reasoning models (LRMs) is extremely difficult to detect, allowing developers to easily inflate performance scores on public leaderboards. The study shows that reinforcement learning methods like GRPO and PPO can effectively conceal contamination signals, undermining the integrity of AI model evaluations.
$NEAR
AINeutralarXiv β CS AI Β· Mar 37/103
π§ Researchers discovered that the traditional cross-entropy scaling law for large language models breaks down at very large scales because only one component (error-entropy) actually follows power-law scaling, while other components remain constant. This finding explains why model performance improvements become less predictable as models grow larger and establishes a new error-entropy scaling law for better understanding LLM development.