169 articles tagged with #reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 37/107
๐ง Researchers developed a method for creating synthetic instruction datasets to improve domain-specific LLMs, demonstrating with a 9.5 billion token Japanese financial dataset. The approach enhances both domain expertise and reasoning capabilities, with models and datasets being open-sourced for broader use.
AIBullisharXiv โ CS AI ยท Mar 36/109
๐ง Researchers introduce Surgical Post-Training (SPoT), a new method to improve Large Language Model reasoning while preventing catastrophic forgetting. SPoT achieved 6.2% accuracy improvement on Qwen3-8B using only 4k data pairs and 28 minutes of training, offering a more efficient alternative to traditional post-training approaches.
AIBullisharXiv โ CS AI ยท Mar 36/108
๐ง Researchers introduced GOME, an AI agent that uses gradient-based optimization instead of tree search for machine learning engineering tasks, achieving 35.1% success rate on MLE-Bench. The study shows gradient-based approaches outperform tree search as AI reasoning capabilities improve, suggesting this method will become more effective as LLMs advance.
AIBullisharXiv โ CS AI ยท Mar 36/105
๐ง Researchers introduced GateLens, an LLM-based system that uses Relational Algebra as an intermediate layer to analyze complex tabular data more reliably than traditional approaches. The system demonstrated over 80% reduction in analysis time in automotive software analytics while maintaining high accuracy, outperforming existing Chain-of-Thought methods.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers introduce AdaBack, a new reinforcement learning algorithm that uses partial supervision to help AI models learn complex reasoning tasks. The method dynamically adjusts the amount of guidance provided to each training sample, enabling models to solve mathematical reasoning problems that traditional supervised learning and reinforcement learning methods cannot handle.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers propose Quantile Advantage Estimation (QAE) to stabilize Reinforcement Learning with Verifiable Rewards (RLVR) for large language model reasoning. The method replaces mean baselines with group-wise K-quantile baselines to prevent entropy collapse and explosion, showing sustained improvements on mathematical reasoning tasks.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers developed Set Supervised Fine-Tuning (SSFT) and Global Forking Policy Optimization (GFPO) methods to improve large language model reasoning by enabling parallel processing through 'global forking tokens.' The techniques preserve diverse reasoning modes and demonstrate superior performance on math and code generation benchmarks compared to traditional fine-tuning approaches.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers propose ChainMPQ, a training-free method to reduce relation hallucinations in Large Vision-Language Models (LVLMs) by using interleaved text-image reasoning chains. The approach addresses the most common but least studied type of AI hallucination by sequentially analyzing subjects, objects, and their relationships through multi-perspective questioning.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduce WavefrontDiffusion, a new dynamic decoding approach for Diffusion Language Models that improves text generation quality by expanding from finalized positions rather than using fixed blocks. The method achieves state-of-the-art performance on reasoning and code generation benchmarks while maintaining computational efficiency equivalent to existing block-based methods.
AIBullisharXiv โ CS AI ยท Mar 26/1018
๐ง Researchers developed RD-MLDG, a new framework that uses multimodal large language models with reasoning chains to improve domain generalization in deep learning. The approach addresses challenges in cross-domain visual recognition by leveraging reasoning capabilities rather than just visual feature invariance, achieving state-of-the-art performance on standard benchmarks.
AIBullisharXiv โ CS AI ยท Mar 26/1018
๐ง Researchers introduce TTE-v2, a new multimodal retrieval framework that achieves state-of-the-art performance by incorporating reasoning steps during retrieval and reranking. The approach demonstrates that scaling based on reasoning tokens rather than model size can significantly improve performance, with TTE-v2-7B reaching 75.7% accuracy on MMEB-V2 benchmark.
AIBullisharXiv โ CS AI ยท Mar 26/1013
๐ง Researchers propose an LLM-driven framework for generating multi-turn task-oriented dialogues to create more realistic reasoning benchmarks. The framework addresses limitations in current AI evaluation methods by producing synthetic datasets that better reflect real-world complexity and contextual coherence.
AINeutralarXiv โ CS AI ยท Mar 27/1014
๐ง A comprehensive study of 504 AI model configurations reveals that reasoning capabilities in large language models are highly task-dependent, with simple tasks like binary classification actually degrading by up to 19.9 percentage points while complex 27-class emotion recognition improves by up to 16.0 points. The research challenges the assumption that reasoning universally improves AI performance across all language tasks.
AIBullisharXiv โ CS AI ยท Mar 26/1017
๐ง Researchers introduce MITS (Mutual Information Tree Search), a new framework that improves reasoning capabilities in large language models using information-theoretic principles. The method uses pointwise mutual information for step-wise evaluation and achieves better performance while being more computationally efficient than existing tree search methods like Tree-of-Thought.
AIBullisharXiv โ CS AI ยท Mar 26/1021
๐ง Researchers propose a training-free solution to reduce hallucinations in multimodal AI models by rebalancing attention between perception and reasoning layers. The method achieves 4.2% improvement in reasoning accuracy with minimal computational overhead.
AIBullisharXiv โ CS AI ยท Mar 26/1014
๐ง Researchers introduce Latent Self-Consistency (LSC), a new method for improving Large Language Model output reliability across both short and long-form reasoning tasks. LSC uses learnable token embeddings to select semantically consistent responses with only 0.9% computational overhead, outperforming existing consistency methods like Self-Consistency and Universal Self-Consistency.
AIBullisharXiv โ CS AI ยท Mar 26/1014
๐ง Researchers introduce MMKG-RDS, a framework that uses multimodal knowledge graphs to synthesize high-quality training data for improving AI model reasoning abilities. Testing on Qwen3 models showed 9.2% improvement in reasoning accuracy, with applications for complex benchmark construction involving tables and formulas.
AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง Researchers have developed PATRA, a new AI model that improves time series question answering by better understanding patterns like trends and seasonality. The model addresses limitations in existing LLM approaches that treat time series data as simple text or images, introducing pattern-aware mechanisms and balanced learning across tasks of varying difficulty.
AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง Researchers propose RL-aware distillation (RLAD), a new method to efficiently transfer knowledge from large language models to smaller ones during reinforcement learning training. The approach uses Trust Region Ratio Distillation (TRRD) to selectively guide student models only when it improves policy updates, outperforming existing distillation methods across reasoning benchmarks.
AIBullisharXiv โ CS AI ยท Feb 276/108
๐ง Researchers developed a new framework called 'Stitching Noisy Diffusion Thoughts' that improves AI reasoning by combining the best parts of multiple solution attempts rather than just selecting complete answers. The method achieves up to 23.8% accuracy improvement on math and coding tasks while reducing computation time by 1.8x compared to existing approaches.
AIBullishOpenAI News ยท Aug 56/106
๐ง A new company has released gpt-oss-120b and gpt-oss-20b, two open-weight language models under Apache 2.0 license that deliver strong performance at low cost. The models excel at reasoning tasks and tool use while being optimized for efficient deployment on consumer hardware.
AIBullishOpenAI News ยท Aug 56/104
๐ง Two new open-weight reasoning models, gpt-oss-120b and gpt-oss-20b, have been released under the Apache 2.0 license. These models are available for use under a specific gpt-oss usage policy.
AIBullishHugging Face Blog ยท Jul 86/105
๐ง SmolLM3 represents a new compact language model that combines multilingual capabilities with long-context reasoning abilities. The model appears to be designed for efficiency while maintaining strong performance across multiple languages and complex reasoning tasks.
AIBullishGoogle DeepMind Blog ยท May 206/102
๐ง Google announces updates to its Gemini AI models, with Gemini 2.5 Pro maintaining its position as the preferred coding model for developers and 2.5 Flash receiving improvements. The company introduces Deep Think, an experimental enhanced reasoning mode for the 2.5 Pro model.
AIBullishOpenAI News ยท Feb 26/105
๐ง A new AI research agent has been launched that can synthesize large amounts of online information and complete complex multi-step research tasks through advanced reasoning capabilities. The tool is currently available to Pro users with rollout planned for Plus and Team subscribers.