AIBullisharXiv – CS AI · 5d ago7/10
🧠Researchers introduce KACE, a novel context engineering method that improves large language models' mathematical reasoning by separating knowledge storage from usage through difficulty and domain-based organization. The approach achieves 62.2% accuracy on AIME 2025, significantly outperforming existing self-consistency methods while maintaining comparable computational efficiency.
AIBullisharXiv – CS AI · Apr 77/10
🧠Researchers propose Continuous Softened Retracing reSampling (CSRS) to improve the self-evolution of Multimodal Large Language Models by addressing biases in feedback mechanisms. The method uses continuous reward signals instead of binary rewards and achieves state-of-the-art results on mathematical reasoning benchmarks like MathVision using Qwen2.5-VL-7B.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers introduce Prism, a new self-evolving AI reasoning system that prevents diversity collapse in problem generation by maintaining semantic coverage across mathematical problem spaces. The system achieved significant accuracy improvements over existing methods on mathematical reasoning benchmarks and generated 100k diverse mathematical questions.
AIBullisharXiv – CS AI · Mar 37/103
🧠Researchers introduce GAR (Generative Adversarial Reinforcement Learning), a new AI training framework that jointly trains problem generators and solvers in an adversarial loop for formal theorem proving. The method shows significant improvements in mathematical proof capabilities, with models achieving 4.20% average relative improvement on benchmark tests.
AIBullishOpenAI News · Dec 117/108
🧠OpenAI has released GPT-5.2, their most advanced model for mathematics and science applications, achieving state-of-the-art performance on benchmarks like GPQA Diamond and FrontierMath. The model demonstrates significant research capabilities, including solving open theoretical problems and generating reliable mathematical proofs.
AINeutralarXiv – CS AI · 2d ago6/10
🧠LeanMarathon introduces a multi-agent system that automates the formalization of research mathematics in Lean, solving long-horizon verification challenges through an evolving blueprint architecture. The system successfully formalized seven theorems across recent research papers spanning four Erdős problems without requiring manual verification shortcuts, demonstrating progress toward reliable AI co-mathematics.
AIBullisharXiv – CS AI · 2d ago6/10
🧠Goedel-Architect is a new AI framework for formal theorem proving that uses blueprint generation and refinement to achieve state-of-the-art results on mathematical benchmarks. Built on DeepSeek-V4-Flash, it demonstrates significant improvements in solving complex mathematical problems while maintaining cost efficiency up to 500x lower than comparable solutions.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers present the E∆-MHC-Geo Transformer, a novel deep learning architecture that maintains orthogonality in residual connections across all input values and parameters, outperforming existing methods like JPmHC and GPT on stability and rotation metrics while using 33% fewer layers.
AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers propose DeepInsightTheorem, a framework that teaches large language models to improve informal theorem proving by explicitly extracting and learning core mathematical techniques. The hierarchical dataset combined with a multi-stage training strategy enables LLMs to perform more insightful mathematical reasoning, outperforming existing baseline approaches on challenging benchmarks.