y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#deepseek News & Analysis

39 articles tagged with #deepseek. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

39 articles
AIBearisharXiv โ€“ CS AI ยท 2d ago7/10
๐Ÿง 

Conflicts Make Large Reasoning Models Vulnerable to Attacks

Researchers discovered that large reasoning models (LRMs) like DeepSeek R1 and Llama become significantly more vulnerable to adversarial attacks when presented with conflicting objectives or ethical dilemmas. Testing across 1,300+ prompts revealed that safety mechanisms break down when internal alignment values compete, with neural representations of safety and functionality overlapping under conflict.

๐Ÿง  Llama
AINeutralarXiv โ€“ CS AI ยท 6d ago7/10
๐Ÿง 

An Automated Survey of Generative Artificial Intelligence: Large Language Models, Architectures, Protocols, and Applications

A comprehensive survey of generative AI and large language models as of early 2026 has been published, covering frontier open-weight models like DeepSeek and Qwen alongside proprietary systems, with detailed analysis of architectures, deployment protocols, and applications across fifteen industry sectors.

๐Ÿข Anthropic๐Ÿง  GPT-5๐Ÿง  Claude
AINeutralarXiv โ€“ CS AI ยท 6d ago7/10
๐Ÿง 

The ATOM Report: Measuring the Open Language Model Ecosystem

A comprehensive study of the open language model ecosystem reveals that Chinese AI models, including Qwen and DeepSeek, have overtaken U.S.-developed models like Meta's Llama since summer 2025, with the gap continuing to widen. The research analyzes ~1.5K mainline open models across adoption metrics, market share, and performance to document this significant shift in AI development geography.

$ATOM๐Ÿข Hugging Face๐Ÿง  Llama
AIBearisharXiv โ€“ CS AI ยท Apr 77/10
๐Ÿง 

Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty

Research reveals that large language models like DeepSeek-V3.2, Gemini-3, and GPT-5.2 show rigid adaptation patterns when learning from changing environments, particularly struggling with loss-based learning compared to humans. The study found LLMs demonstrate asymmetric responses to positive versus negative feedback, with some models showing extreme perseveration after environmental changes.

๐Ÿง  GPT-5๐Ÿง  Gemini
AIBullisharXiv โ€“ CS AI ยท Mar 267/10
๐Ÿง 

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

Researchers developed ODMA, a new memory allocation strategy that improves Large Language Model serving performance on memory-constrained accelerators by up to 27%. The technique addresses bandwidth limitations in LPDDR systems through adaptive bucket partitioning and dynamic generation-length prediction.

AINeutralarXiv โ€“ CS AI ยท Mar 167/10
๐Ÿง 

Semantic Invariance in Agentic AI

Researchers developed a testing framework to evaluate how reliably AI agents maintain consistent reasoning when inputs are semantically equivalent but differently phrased. Their study of seven foundation models across 19 reasoning problems found that larger models aren't necessarily more robust, with the smaller Qwen3-30B-A3B achieving the highest stability at 79.6% invariant responses.

AINeutralarXiv โ€“ CS AI ยท Mar 127/10
๐Ÿง 

Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects

Research examining five major LLMs found they exhibit human-like cognitive biases when evaluating judicial scenarios, showing stronger virtuous victim effects but reduced credential-based halo effects compared to humans. The study suggests LLMs may offer modest improvements over human decision-making in judicial contexts, though variability across models limits current practical application.

๐Ÿง  ChatGPT๐Ÿง  Claude๐Ÿง  Sonnet
AIBearisharXiv โ€“ CS AI ยท Mar 127/10
๐Ÿง 

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Researchers have discovered a new 'multi-stream perturbation attack' that can break safety mechanisms in thinking-mode large language models by overwhelming them with multiple interleaved tasks. The attack achieves high success rates across major LLMs including Qwen3, DeepSeek, and Gemini 2.5 Flash, causing both safety bypass and system collapse.

๐Ÿง  Gemini
AIBullishWired โ€“ AI ยท Mar 117/10
๐Ÿง 

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show

Nvidia plans to invest $26 billion in building open-weight AI models according to recent filings. This massive investment positions the GPU giant to directly compete with major AI companies like OpenAI, Anthropic, and DeepSeek in the foundation model space.

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show
๐Ÿข OpenAI๐Ÿข Anthropic๐Ÿข Nvidia
AIBullisharXiv โ€“ CS AI ยท Mar 117/10
๐Ÿง 

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

Researchers introduce SATURN, a new reinforcement learning framework that uses Boolean Satisfiability (SAT) problems to improve large language models' reasoning capabilities. The framework addresses key limitations in existing RL approaches by enabling scalable task construction, automated verification, and precise difficulty control through curriculum learning.

AIBullisharXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.

AINeutralarXiv โ€“ CS AI ยท Feb 277/103
๐Ÿง 

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Researchers introduce Tool Decathlon (Toolathlon), a comprehensive benchmark for evaluating AI language agents across 32 software applications and 604 tools in realistic, multi-step scenarios. The benchmark reveals significant limitations in current AI models, with the best performer (Claude-4.5-Sonnet) achieving only 38.6% success rate on complex, real-world tasks.

AIBearishCoinTelegraph โ€“ AI ยท Feb 257/104
๐Ÿง 

Anthropic says it's been targeted in massive distillation attacks

Anthropic alleges that Chinese AI companies DeepSeek, Moonshot, and MiniMax conducted massive distillation attacks against its Claude AI system, creating 24,000 accounts and making 16 million exchanges to scrape training data. This represents a significant case of AI model theft and highlights growing tensions in the global AI competition.

Anthropic says it's been targeted in massive distillation attacks
AINeutralWall Street Journal โ€“ Tech ยท Jan 277/103
๐Ÿง 

What to Know About China's DeepSeek AI

Chinese AI company DeepSeek claims to have developed high-performing AI models using cost-effective training methods without relying on the most advanced semiconductor chips. This development could potentially challenge the narrative that cutting-edge AI requires the most expensive hardware.

AINeutralWall Street Journal โ€“ Tech ยท Jan 277/102
๐Ÿง 

Silicon Valley Is Raving About a Made-in-China AI Model

Silicon Valley professionals are praising DeepSeek, a Chinese AI model, calling it 'amazing and impressive' despite being developed using less-advanced semiconductor chips. This recognition highlights China's ability to create competitive AI technology even under chip restrictions.

AINeutralarXiv โ€“ CS AI ยท 2d ago6/10
๐Ÿง 

If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs

Researchers introduce LIFESTATE-BENCH, a benchmark for evaluating lifelong learning capabilities in large language models through multi-turn interactions using narrative datasets like Hamlet. Testing shows nonparametric approaches significantly outperform parametric methods, but all models struggle with catastrophic forgetting over extended interactions, revealing fundamental limitations in LLM memory and consistency.

๐Ÿง  GPT-4๐Ÿง  Llama
AIBullisharXiv โ€“ CS AI ยท Mar 266/10
๐Ÿง 

Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Researchers introduce Generative Adversarial Reasoner, a new training framework that improves LLM mathematical reasoning by using adversarial reinforcement learning between a reasoner and discriminator model. The method achieved significant performance gains on mathematical benchmarks, improving DeepSeek models by 7-10 percentage points on AIME24 tests.

๐Ÿง  Llama
AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Shorten After You're Right: Lazy Length Penalties for Reasoning RL

Researchers propose a new method to reduce the length of reasoning paths in large AI models like OpenAI o1 and DeepSeek R1 without additional training stages. The approach integrates reward designs directly into reinforcement learning, achieving 40% shorter responses in logic tasks with 14% performance improvement, and 33% reduction in math problems while maintaining accuracy.

๐Ÿข OpenAI๐Ÿง  o1
AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning

Researchers developed E2H Reasoner, a curriculum reinforcement learning method that improves LLM reasoning by training on tasks from easy to hard. The approach shows significant improvements for small LLMs (1.5B-3B parameters) that struggle with vanilla RL training alone.

AIBearisharXiv โ€“ CS AI ยท Mar 96/10
๐Ÿง 

The Fragility Of Moral Judgment In Large Language Models

Researchers tested the stability of moral judgments in large language models using nearly 3,000 ethical dilemmas, finding that narrative framing and evaluation methods significantly influence AI decisions. The study reveals that LLM moral reasoning is highly dependent on how questions are presented rather than underlying moral substance, with only 35.7% consistency across different evaluation protocols.

๐Ÿง  GPT-4๐Ÿง  Claude
AINeutralarXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

Benchmarking Overton Pluralism in LLMs

Researchers introduced OVERTONBENCH, a framework for measuring viewpoint diversity in large language models through the OVERTONSCORE metric. In a study of 8 LLMs with 1,208 participants, models scored 0.35-0.41 out of 1.0, with DeepSeek V3 performing best, showing significant room for improvement in pluralistic representation.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek

Researchers introduced Seek-CAD, a new system that uses the open-source DeepSeek-R1 language model to generate 3D CAD models locally without requiring expensive cloud-based AI services. The system incorporates visual feedback and self-refinement mechanisms to improve CAD model generation, potentially making AI-assisted design more accessible for industrial applications.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

RepoRepair: Leveraging Code Documentation for Repository-Level Automated Program Repair

RepoRepair is a new AI-powered automated program repair system that uses hierarchical code documentation to fix bugs across entire software repositories. The system achieves a 45.7% repair rate on SWE-bench Lite at $0.44 per fix by leveraging LLMs like DeepSeek-V3 and Claude-4 for fault localization and code repair.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1020
๐Ÿง 

Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty

Researchers developed ARLCP, a reinforcement learning framework that reduces unnecessary reflection in Large Reasoning Models, achieving 53% shorter responses while improving accuracy by 5.8% on smaller models. The method addresses computational inefficiencies in AI reasoning by dynamically balancing efficiency and accuracy through adaptive penalties.

Page 1 of 2Next โ†’