y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#deepseek News & Analysis

53 articles tagged with #deepseek. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

53 articles
AIBullisharXiv – CS AI · 2d ago7/10
🧠

ESPO: Early-Stopping Proximal Policy Optimization

Researchers propose ESPO, an optimization technique that improves large language model training by detecting and terminating failed reasoning trajectories early rather than forcing completion. The method reduces computational waste by over 20% while achieving superior performance on mathematical reasoning benchmarks compared to standard PPO training.

AIBullisharXiv – CS AI · 3d ago7/10
🧠

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

Researchers propose Group-Query Latent Attention (GQLA), an advancement of DeepSeek's Multi-head Latent Attention that enables hardware-adaptive decoding through two algebraically equivalent inference paths without requiring model retraining. The innovation allows a single trained model to optimize performance across different hardware platforms—H100 GPUs and export-restricted H20 chips—while maintaining computational efficiency and supporting distributed tensor parallelism.

AIBullisharXiv – CS AI · 3d ago7/10
🧠

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

Researchers present a systematic study of Attention-FFN Disaggregation (AFD), a technique that separates attention and expert layers across different GPU groups to optimize inference serving for Mixture-of-Experts language models. The framework demonstrates that AFD enables 4k tokens/s throughput on DeepSeek-V3.2 under strict latency constraints where traditional disaggregation approaches fail, providing design principles for scaling LLM infrastructure.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

Better Accuracies, Worse Reasoning: A Step-Level Audit of Medical Chain-of-Thought Distillation

Researchers discovered that chain-of-thought distillation—training smaller AI models to imitate larger models' reasoning—produces higher answer accuracy on medical benchmarks while simultaneously degrading reasoning quality. A Qwen3-8B student model improved from 74.7% to 84.4% accuracy on MedQA-USMLE, yet error rates in individual reasoning steps jumped from 30.6% to 50.3%, suggesting models learn to mimic expert-like output without grounding claims in sound logic.

AIBearishDecrypt – AI · 3d ago7/10
🧠

DeepSeek, Xiaomi Just Made Frontier AI 99% Cheaper. American Labs Went the Other Way

Chinese AI labs DeepSeek and Xiaomi have dramatically slashed prices on their frontier AI models, making them approximately 99% cheaper than comparable American offerings like GPT-4.5 and Claude Opus. This pricing strategy represents a significant shift in the competitive landscape, with Chinese providers pursuing aggressive cost-based competition while American labs maintain premium pricing models.

DeepSeek, Xiaomi Just Made Frontier AI 99% Cheaper. American Labs Went the Other Way
🧠 GPT-5🧠 Claude
AIBullisharXiv – CS AI · May 117/10
🧠

MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference

Researchers introduce MISA, an optimization technique that reduces computational costs in DeepSeek's sparse attention mechanism for large language models by treating indexer heads as a mixture-of-experts system. The method achieves 3.82x speedup on GPU inference while maintaining performance across benchmarks, addressing a key bottleneck in long-context LLM processing.

🏢 Nvidia
AIBullishBlockonomi · May 47/10
🧠

Alibaba (BABA) Stock: Morgan Stanley Survey Crowns It China’s Leading AI Player

Morgan Stanley's latest survey ranks Alibaba as China's leading AI player, with 41% of CIOs selecting it as their top choice and the company experiencing over 40% cloud growth. The investment bank has set a $180 price target for BABA stock, signaling confidence in the company's AI dominance in the Chinese market.

AIBearisharXiv – CS AI · Apr 207/10
🧠

Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing

Researchers have discovered a critical vulnerability in Large Reasoning Models (LRMs) like DeepSeek R1 and OpenAI o4-mini that allows attackers to inject harmful content into the reasoning process while keeping final answers unchanged. The Psychology-based Reasoning-targeted Jailbreak Attack (PRJA) framework achieves an 83.6% success rate by exploiting semantic triggers and psychological principles, revealing a previously understudied safety gap in AI systems deployed in high-stakes domains.

🏢 OpenAI
AIBearisharXiv – CS AI · Apr 147/10
🧠

Conflicts Make Large Reasoning Models Vulnerable to Attacks

Researchers discovered that large reasoning models (LRMs) like DeepSeek R1 and Llama become significantly more vulnerable to adversarial attacks when presented with conflicting objectives or ethical dilemmas. Testing across 1,300+ prompts revealed that safety mechanisms break down when internal alignment values compete, with neural representations of safety and functionality overlapping under conflict.

🧠 Llama
AINeutralarXiv – CS AI · Apr 107/10
🧠

The ATOM Report: Measuring the Open Language Model Ecosystem

A comprehensive study of the open language model ecosystem reveals that Chinese AI models, including Qwen and DeepSeek, have overtaken U.S.-developed models like Meta's Llama since summer 2025, with the gap continuing to widen. The research analyzes ~1.5K mainline open models across adoption metrics, market share, and performance to document this significant shift in AI development geography.

$ATOM🏢 Hugging Face🧠 Llama
AINeutralarXiv – CS AI · Apr 107/10
🧠

An Automated Survey of Generative Artificial Intelligence: Large Language Models, Architectures, Protocols, and Applications

A comprehensive survey of generative AI and large language models as of early 2026 has been published, covering frontier open-weight models like DeepSeek and Qwen alongside proprietary systems, with detailed analysis of architectures, deployment protocols, and applications across fifteen industry sectors.

🏢 Anthropic🧠 GPT-5🧠 Claude
AIBearisharXiv – CS AI · Apr 77/10
🧠

Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty

Research reveals that large language models like DeepSeek-V3.2, Gemini-3, and GPT-5.2 show rigid adaptation patterns when learning from changing environments, particularly struggling with loss-based learning compared to humans. The study found LLMs demonstrate asymmetric responses to positive versus negative feedback, with some models showing extreme perseveration after environmental changes.

🧠 GPT-5🧠 Gemini
AIBullisharXiv – CS AI · Mar 267/10
🧠

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

Researchers developed ODMA, a new memory allocation strategy that improves Large Language Model serving performance on memory-constrained accelerators by up to 27%. The technique addresses bandwidth limitations in LPDDR systems through adaptive bucket partitioning and dynamic generation-length prediction.

AINeutralarXiv – CS AI · Mar 167/10
🧠

Semantic Invariance in Agentic AI

Researchers developed a testing framework to evaluate how reliably AI agents maintain consistent reasoning when inputs are semantically equivalent but differently phrased. Their study of seven foundation models across 19 reasoning problems found that larger models aren't necessarily more robust, with the smaller Qwen3-30B-A3B achieving the highest stability at 79.6% invariant responses.

AINeutralarXiv – CS AI · Mar 127/10
🧠

Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects

Research examining five major LLMs found they exhibit human-like cognitive biases when evaluating judicial scenarios, showing stronger virtuous victim effects but reduced credential-based halo effects compared to humans. The study suggests LLMs may offer modest improvements over human decision-making in judicial contexts, though variability across models limits current practical application.

🧠 ChatGPT🧠 Claude🧠 Sonnet
AIBearisharXiv – CS AI · Mar 127/10
🧠

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Researchers have discovered a new 'multi-stream perturbation attack' that can break safety mechanisms in thinking-mode large language models by overwhelming them with multiple interleaved tasks. The attack achieves high success rates across major LLMs including Qwen3, DeepSeek, and Gemini 2.5 Flash, causing both safety bypass and system collapse.

🧠 Gemini
AIBullishWired – AI · Mar 117/10
🧠

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show

Nvidia plans to invest $26 billion in building open-weight AI models according to recent filings. This massive investment positions the GPU giant to directly compete with major AI companies like OpenAI, Anthropic, and DeepSeek in the foundation model space.

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show
🏢 OpenAI🏢 Anthropic🏢 Nvidia
AIBullisharXiv – CS AI · Mar 117/10
🧠

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

Researchers introduce SATURN, a new reinforcement learning framework that uses Boolean Satisfiability (SAT) problems to improve large language models' reasoning capabilities. The framework addresses key limitations in existing RL approaches by enabling scalable task construction, automated verification, and precise difficulty control through curriculum learning.

AIBullisharXiv – CS AI · Mar 37/104
🧠

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.

AINeutralarXiv – CS AI · Feb 277/103
🧠

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Researchers introduce Tool Decathlon (Toolathlon), a comprehensive benchmark for evaluating AI language agents across 32 software applications and 604 tools in realistic, multi-step scenarios. The benchmark reveals significant limitations in current AI models, with the best performer (Claude-4.5-Sonnet) achieving only 38.6% success rate on complex, real-world tasks.

AIBearishCoinTelegraph – AI · Feb 257/104
🧠

Anthropic says it's been targeted in massive distillation attacks

Anthropic alleges that Chinese AI companies DeepSeek, Moonshot, and MiniMax conducted massive distillation attacks against its Claude AI system, creating 24,000 accounts and making 16 million exchanges to scrape training data. This represents a significant case of AI model theft and highlights growing tensions in the global AI competition.

Anthropic says it's been targeted in massive distillation attacks
AINeutralWall Street Journal – Tech · Jan 277/103
🧠

What to Know About China's DeepSeek AI

Chinese AI company DeepSeek claims to have developed high-performing AI models using cost-effective training methods without relying on the most advanced semiconductor chips. This development could potentially challenge the narrative that cutting-edge AI requires the most expensive hardware.

AINeutralWall Street Journal – Tech · Jan 277/102
🧠

Silicon Valley Is Raving About a Made-in-China AI Model

Silicon Valley professionals are praising DeepSeek, a Chinese AI model, calling it 'amazing and impressive' despite being developed using less-advanced semiconductor chips. This recognition highlights China's ability to create competitive AI technology even under chip restrictions.

Page 1 of 3Next →