AIBullisharXiv – CS AI · 2d ago7/10
🧠Researchers propose ESPO, an optimization technique that improves large language model training by detecting and terminating failed reasoning trajectories early rather than forcing completion. The method reduces computational waste by over 20% while achieving superior performance on mathematical reasoning benchmarks compared to standard PPO training.
AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers propose Group-Query Latent Attention (GQLA), an advancement of DeepSeek's Multi-head Latent Attention that enables hardware-adaptive decoding through two algebraically equivalent inference paths without requiring model retraining. The innovation allows a single trained model to optimize performance across different hardware platforms—H100 GPUs and export-restricted H20 chips—while maintaining computational efficiency and supporting distributed tensor parallelism.
AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers present a systematic study of Attention-FFN Disaggregation (AFD), a technique that separates attention and expert layers across different GPU groups to optimize inference serving for Mixture-of-Experts language models. The framework demonstrates that AFD enables 4k tokens/s throughput on DeepSeek-V3.2 under strict latency constraints where traditional disaggregation approaches fail, providing design principles for scaling LLM infrastructure.
AIBearisharXiv – CS AI · 3d ago7/10
🧠Researchers discovered that chain-of-thought distillation—training smaller AI models to imitate larger models' reasoning—produces higher answer accuracy on medical benchmarks while simultaneously degrading reasoning quality. A Qwen3-8B student model improved from 74.7% to 84.4% accuracy on MedQA-USMLE, yet error rates in individual reasoning steps jumped from 30.6% to 50.3%, suggesting models learn to mimic expert-like output without grounding claims in sound logic.
AIBearishDecrypt – AI · 3d ago7/10
🧠Chinese AI labs DeepSeek and Xiaomi have dramatically slashed prices on their frontier AI models, making them approximately 99% cheaper than comparable American offerings like GPT-4.5 and Claude Opus. This pricing strategy represents a significant shift in the competitive landscape, with Chinese providers pursuing aggressive cost-based competition while American labs maintain premium pricing models.
🧠 GPT-5🧠 Claude
AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers introduce ReMoE, a router fine-tuning framework that optimizes Mixture-of-Experts language models for memory-constrained inference by increasing expert reuse and reducing storage I/O overhead. The approach improves expert reuse by 26% while maintaining performance, delivering up to 1.99× decode speedup on edge devices.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers introduce MISA, an optimization technique that reduces computational costs in DeepSeek's sparse attention mechanism for large language models by treating indexer heads as a mixture-of-experts system. The method achieves 3.82x speedup on GPU inference while maintaining performance across benchmarks, addressing a key bottleneck in long-context LLM processing.
🏢 Nvidia
AIBullishBlockonomi · May 47/10
🧠Morgan Stanley's latest survey ranks Alibaba as China's leading AI player, with 41% of CIOs selecting it as their top choice and the company experiencing over 40% cloud growth. The investment bank has set a $180 price target for BABA stock, signaling confidence in the company's AI dominance in the Chinese market.
AIBearisharXiv – CS AI · Apr 207/10
🧠Researchers have discovered a critical vulnerability in Large Reasoning Models (LRMs) like DeepSeek R1 and OpenAI o4-mini that allows attackers to inject harmful content into the reasoning process while keeping final answers unchanged. The Psychology-based Reasoning-targeted Jailbreak Attack (PRJA) framework achieves an 83.6% success rate by exploiting semantic triggers and psychological principles, revealing a previously understudied safety gap in AI systems deployed in high-stakes domains.
🏢 OpenAI
AIBearisharXiv – CS AI · Apr 147/10
🧠Researchers discovered that large reasoning models (LRMs) like DeepSeek R1 and Llama become significantly more vulnerable to adversarial attacks when presented with conflicting objectives or ethical dilemmas. Testing across 1,300+ prompts revealed that safety mechanisms break down when internal alignment values compete, with neural representations of safety and functionality overlapping under conflict.
🧠 Llama
AINeutralarXiv – CS AI · Apr 107/10
🧠A comprehensive study of the open language model ecosystem reveals that Chinese AI models, including Qwen and DeepSeek, have overtaken U.S.-developed models like Meta's Llama since summer 2025, with the gap continuing to widen. The research analyzes ~1.5K mainline open models across adoption metrics, market share, and performance to document this significant shift in AI development geography.
$ATOM🏢 Hugging Face🧠 Llama
AINeutralarXiv – CS AI · Apr 107/10
🧠A comprehensive survey of generative AI and large language models as of early 2026 has been published, covering frontier open-weight models like DeepSeek and Qwen alongside proprietary systems, with detailed analysis of architectures, deployment protocols, and applications across fifteen industry sectors.
🏢 Anthropic🧠 GPT-5🧠 Claude
AIBearisharXiv – CS AI · Apr 77/10
🧠Research reveals that large language models like DeepSeek-V3.2, Gemini-3, and GPT-5.2 show rigid adaptation patterns when learning from changing environments, particularly struggling with loss-based learning compared to humans. The study found LLMs demonstrate asymmetric responses to positive versus negative feedback, with some models showing extreme perseveration after environmental changes.
🧠 GPT-5🧠 Gemini
AIBullisharXiv – CS AI · Mar 267/10
🧠Researchers developed ODMA, a new memory allocation strategy that improves Large Language Model serving performance on memory-constrained accelerators by up to 27%. The technique addresses bandwidth limitations in LPDDR systems through adaptive bucket partitioning and dynamic generation-length prediction.
AINeutralarXiv – CS AI · Mar 167/10
🧠Researchers developed a testing framework to evaluate how reliably AI agents maintain consistent reasoning when inputs are semantically equivalent but differently phrased. Their study of seven foundation models across 19 reasoning problems found that larger models aren't necessarily more robust, with the smaller Qwen3-30B-A3B achieving the highest stability at 79.6% invariant responses.
AINeutralarXiv – CS AI · Mar 127/10
🧠Research examining five major LLMs found they exhibit human-like cognitive biases when evaluating judicial scenarios, showing stronger virtuous victim effects but reduced credential-based halo effects compared to humans. The study suggests LLMs may offer modest improvements over human decision-making in judicial contexts, though variability across models limits current practical application.
🧠 ChatGPT🧠 Claude🧠 Sonnet
AIBearisharXiv – CS AI · Mar 127/10
🧠Researchers have discovered a new 'multi-stream perturbation attack' that can break safety mechanisms in thinking-mode large language models by overwhelming them with multiple interleaved tasks. The attack achieves high success rates across major LLMs including Qwen3, DeepSeek, and Gemini 2.5 Flash, causing both safety bypass and system collapse.
🧠 Gemini
AIBullishWired – AI · Mar 117/10
🧠Nvidia plans to invest $26 billion in building open-weight AI models according to recent filings. This massive investment positions the GPU giant to directly compete with major AI companies like OpenAI, Anthropic, and DeepSeek in the foundation model space.
🏢 OpenAI🏢 Anthropic🏢 Nvidia
AIBullisharXiv – CS AI · Mar 117/10
🧠Researchers introduce SATURN, a new reinforcement learning framework that uses Boolean Satisfiability (SAT) problems to improve large language models' reasoning capabilities. The framework addresses key limitations in existing RL approaches by enabling scalable task construction, automated verification, and precise difficulty control through curriculum learning.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.
AINeutralarXiv – CS AI · Feb 277/103
🧠Researchers introduce Tool Decathlon (Toolathlon), a comprehensive benchmark for evaluating AI language agents across 32 software applications and 604 tools in realistic, multi-step scenarios. The benchmark reveals significant limitations in current AI models, with the best performer (Claude-4.5-Sonnet) achieving only 38.6% success rate on complex, real-world tasks.
AIBearishCoinTelegraph – AI · Feb 257/104
🧠Anthropic alleges that Chinese AI companies DeepSeek, Moonshot, and MiniMax conducted massive distillation attacks against its Claude AI system, creating 24,000 accounts and making 16 million exchanges to scrape training data. This represents a significant case of AI model theft and highlights growing tensions in the global AI competition.
AIBullishSynced Review · May 157/109
🧠DeepSeek has released a 14-page technical paper on their V3 model, focusing on scaling challenges and hardware-aware co-design for low-cost large model training. The paper, co-authored by DeepSeek CEO Wenfeng Liang, reveals insights into cost-effective AI architecture development.
AINeutralWall Street Journal – Tech · Jan 277/103
🧠Chinese AI company DeepSeek claims to have developed high-performing AI models using cost-effective training methods without relying on the most advanced semiconductor chips. This development could potentially challenge the narrative that cutting-edge AI requires the most expensive hardware.
AINeutralWall Street Journal – Tech · Jan 277/102
🧠Silicon Valley professionals are praising DeepSeek, a Chinese AI model, calling it 'amazing and impressive' despite being developed using less-advanced semiconductor chips. This recognition highlights China's ability to create competitive AI technology even under chip restrictions.