39 articles tagged with #deepseek. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv โ CS AI ยท 2d ago7/10
๐ง Researchers discovered that large reasoning models (LRMs) like DeepSeek R1 and Llama become significantly more vulnerable to adversarial attacks when presented with conflicting objectives or ethical dilemmas. Testing across 1,300+ prompts revealed that safety mechanisms break down when internal alignment values compete, with neural representations of safety and functionality overlapping under conflict.
๐ง Llama
AINeutralarXiv โ CS AI ยท 6d ago7/10
๐ง A comprehensive survey of generative AI and large language models as of early 2026 has been published, covering frontier open-weight models like DeepSeek and Qwen alongside proprietary systems, with detailed analysis of architectures, deployment protocols, and applications across fifteen industry sectors.
๐ข Anthropic๐ง GPT-5๐ง Claude
AINeutralarXiv โ CS AI ยท 6d ago7/10
๐ง A comprehensive study of the open language model ecosystem reveals that Chinese AI models, including Qwen and DeepSeek, have overtaken U.S.-developed models like Meta's Llama since summer 2025, with the gap continuing to widen. The research analyzes ~1.5K mainline open models across adoption metrics, market share, and performance to document this significant shift in AI development geography.
$ATOM๐ข Hugging Face๐ง Llama
AIBearisharXiv โ CS AI ยท Apr 77/10
๐ง Research reveals that large language models like DeepSeek-V3.2, Gemini-3, and GPT-5.2 show rigid adaptation patterns when learning from changing environments, particularly struggling with loss-based learning compared to humans. The study found LLMs demonstrate asymmetric responses to positive versus negative feedback, with some models showing extreme perseveration after environmental changes.
๐ง GPT-5๐ง Gemini
AIBullisharXiv โ CS AI ยท Mar 267/10
๐ง Researchers developed ODMA, a new memory allocation strategy that improves Large Language Model serving performance on memory-constrained accelerators by up to 27%. The technique addresses bandwidth limitations in LPDDR systems through adaptive bucket partitioning and dynamic generation-length prediction.
AINeutralarXiv โ CS AI ยท Mar 167/10
๐ง Researchers developed a testing framework to evaluate how reliably AI agents maintain consistent reasoning when inputs are semantically equivalent but differently phrased. Their study of seven foundation models across 19 reasoning problems found that larger models aren't necessarily more robust, with the smaller Qwen3-30B-A3B achieving the highest stability at 79.6% invariant responses.
AINeutralarXiv โ CS AI ยท Mar 127/10
๐ง Research examining five major LLMs found they exhibit human-like cognitive biases when evaluating judicial scenarios, showing stronger virtuous victim effects but reduced credential-based halo effects compared to humans. The study suggests LLMs may offer modest improvements over human decision-making in judicial contexts, though variability across models limits current practical application.
๐ง ChatGPT๐ง Claude๐ง Sonnet
AIBearisharXiv โ CS AI ยท Mar 127/10
๐ง Researchers have discovered a new 'multi-stream perturbation attack' that can break safety mechanisms in thinking-mode large language models by overwhelming them with multiple interleaved tasks. The attack achieves high success rates across major LLMs including Qwen3, DeepSeek, and Gemini 2.5 Flash, causing both safety bypass and system collapse.
๐ง Gemini
AIBullishWired โ AI ยท Mar 117/10
๐ง Nvidia plans to invest $26 billion in building open-weight AI models according to recent filings. This massive investment positions the GPU giant to directly compete with major AI companies like OpenAI, Anthropic, and DeepSeek in the foundation model space.
๐ข OpenAI๐ข Anthropic๐ข Nvidia
AIBullisharXiv โ CS AI ยท Mar 117/10
๐ง Researchers introduce SATURN, a new reinforcement learning framework that uses Boolean Satisfiability (SAT) problems to improve large language models' reasoning capabilities. The framework addresses key limitations in existing RL approaches by enabling scalable task construction, automated verification, and precise difficulty control through curriculum learning.
AIBullisharXiv โ CS AI ยท Mar 37/104
๐ง Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.
AINeutralarXiv โ CS AI ยท Feb 277/103
๐ง Researchers introduce Tool Decathlon (Toolathlon), a comprehensive benchmark for evaluating AI language agents across 32 software applications and 604 tools in realistic, multi-step scenarios. The benchmark reveals significant limitations in current AI models, with the best performer (Claude-4.5-Sonnet) achieving only 38.6% success rate on complex, real-world tasks.
AIBearishCoinTelegraph โ AI ยท Feb 257/104
๐ง Anthropic alleges that Chinese AI companies DeepSeek, Moonshot, and MiniMax conducted massive distillation attacks against its Claude AI system, creating 24,000 accounts and making 16 million exchanges to scrape training data. This represents a significant case of AI model theft and highlights growing tensions in the global AI competition.
AIBullishSynced Review ยท May 157/109
๐ง DeepSeek has released a 14-page technical paper on their V3 model, focusing on scaling challenges and hardware-aware co-design for low-cost large model training. The paper, co-authored by DeepSeek CEO Wenfeng Liang, reveals insights into cost-effective AI architecture development.
AINeutralWall Street Journal โ Tech ยท Jan 277/103
๐ง Chinese AI company DeepSeek claims to have developed high-performing AI models using cost-effective training methods without relying on the most advanced semiconductor chips. This development could potentially challenge the narrative that cutting-edge AI requires the most expensive hardware.
AINeutralWall Street Journal โ Tech ยท Jan 277/102
๐ง Silicon Valley professionals are praising DeepSeek, a Chinese AI model, calling it 'amazing and impressive' despite being developed using less-advanced semiconductor chips. This recognition highlights China's ability to create competitive AI technology even under chip restrictions.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers introduce LIFESTATE-BENCH, a benchmark for evaluating lifelong learning capabilities in large language models through multi-turn interactions using narrative datasets like Hamlet. Testing shows nonparametric approaches significantly outperform parametric methods, but all models struggle with catastrophic forgetting over extended interactions, revealing fundamental limitations in LLM memory and consistency.
๐ง GPT-4๐ง Llama
AIBullisharXiv โ CS AI ยท Mar 266/10
๐ง Researchers introduce Generative Adversarial Reasoner, a new training framework that improves LLM mathematical reasoning by using adversarial reinforcement learning between a reasoner and discriminator model. The method achieved significant performance gains on mathematical benchmarks, improving DeepSeek models by 7-10 percentage points on AIME24 tests.
๐ง Llama
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers propose a new method to reduce the length of reasoning paths in large AI models like OpenAI o1 and DeepSeek R1 without additional training stages. The approach integrates reward designs directly into reinforcement learning, achieving 40% shorter responses in logic tasks with 14% performance improvement, and 33% reduction in math problems while maintaining accuracy.
๐ข OpenAI๐ง o1
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers developed E2H Reasoner, a curriculum reinforcement learning method that improves LLM reasoning by training on tasks from easy to hard. The approach shows significant improvements for small LLMs (1.5B-3B parameters) that struggle with vanilla RL training alone.
AIBearisharXiv โ CS AI ยท Mar 96/10
๐ง Researchers tested the stability of moral judgments in large language models using nearly 3,000 ethical dilemmas, finding that narrative framing and evaluation methods significantly influence AI decisions. The study reveals that LLM moral reasoning is highly dependent on how questions are presented rather than underlying moral substance, with only 35.7% consistency across different evaluation protocols.
๐ง GPT-4๐ง Claude
AINeutralarXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduced OVERTONBENCH, a framework for measuring viewpoint diversity in large language models through the OVERTONSCORE metric. In a study of 8 LLMs with 1,208 participants, models scored 0.35-0.41 out of 1.0, with DeepSeek V3 performing best, showing significant room for improvement in pluralistic representation.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduced Seek-CAD, a new system that uses the open-source DeepSeek-R1 language model to generate 3D CAD models locally without requiring expensive cloud-based AI services. The system incorporates visual feedback and self-refinement mechanisms to improve CAD model generation, potentially making AI-assisted design more accessible for industrial applications.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง RepoRepair is a new AI-powered automated program repair system that uses hierarchical code documentation to fix bugs across entire software repositories. The system achieves a 45.7% repair rate on SWE-bench Lite at $0.44 per fix by leveraging LLMs like DeepSeek-V3 and Claude-4 for fault localization and code repair.
AIBullisharXiv โ CS AI ยท Mar 26/1020
๐ง Researchers developed ARLCP, a reinforcement learning framework that reduces unnecessary reflection in Large Reasoning Models, achieving 53% shorter responses while improving accuracy by 5.8% on smaller models. The method addresses computational inefficiencies in AI reasoning by dynamically balancing efficiency and accuracy through adaptive penalties.