58 articles tagged with #mathematical-reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง Researchers introduce UpSkill, a new training method that uses Mutual Information Skill Learning to improve large language models' ability to generate diverse correct responses across multiple attempts. The technique shows ~3% improvements in pass@k metrics on mathematical reasoning tasks using models like Llama 3.1-8B and Qwen 2.5-7B without degrading single-attempt accuracy.
AINeutralarXiv โ CS AI ยท Feb 276/106
๐ง Researchers introduced ReasoningMath-Plus, a new benchmark with 150 problems designed to evaluate structural mathematical reasoning in large language models. The study reveals that while leading LLMs achieve relatively high final-answer accuracy, they perform significantly worse on process-level evaluation metrics, indicating that answer-only assessments may overestimate actual reasoning capabilities.
$NEAR
AINeutralarXiv โ CS AI ยท Feb 275/108
๐ง Researchers introduce Soft Sequence Policy Optimization (SSPO), a new reinforcement learning method for training Large Language Models that improves upon existing policy optimization approaches. The technique uses soft gating functions and sequence-level importance sampling to enhance training stability and performance in mathematical reasoning tasks.
AIBullishApple Machine Learning ยท Feb 256/103
๐ง Researchers propose Constructive Circuit Amplification, a new method for improving LLM mathematical reasoning by directly targeting and strengthening specific neural network subnetworks (circuits) responsible for particular tasks. This approach builds on findings that model improvements through fine-tuning often result from amplifying existing circuits rather than creating new capabilities.
AIBullishGoogle DeepMind Blog ยท Oct 236/1010
๐ง Google is rolling out Deep Think feature in the Gemini app for Google AI Ultra subscribers. The company is also providing select mathematicians with access to the full Gemini 2.5 Deep Think model that was entered into the International Mathematical Olympiad competition.
AIBullishSynced Review ยท Apr 306/106
๐ง DeepSeek AI has released DeepSeek-Prover-V2, an open-source large language model specifically designed for Lean 4 theorem proving. The model employs recursive proof search methodology and uses DeepSeek-V3 for training data generation with reinforcement learning, achieving top performance results on the MiniF2F benchmark.
AINeutralarXiv โ CS AI ยท Mar 115/10
๐ง Researchers developed MathQ-Verify, a five-stage pipeline that validates mathematical questions for training AI models, addressing the overlooked problem of ill-posed or under-specified math problems in datasets. The system achieves 90% precision and 63% recall, improving F1 scores by up to 25 percentage points over baseline methods.
AINeutralOpenAI News ยท Jun 24/106
๐ง GamePad is introduced as a learning environment specifically designed for theorem proving applications. The platform appears to focus on providing educational tools and resources for mathematical proof development and validation.