AINeutralarXiv – CS AI · 5d ago7/10
🧠Researchers propose a compute-aware evaluation framework for assessing adversarial robustness in large language models, measuring attack effort in FLOPs rather than fixed query budgets. Testing across multiple models and attack strategies reveals that alignment training has non-monotonic effects on robustness, scaling reduces gradient-based attacks but not cheaper template-based ones, and safety measures leave certain harm categories disproportionately accessible.
AINeutralarXiv – CS AI · 6d ago7/10
🧠Researchers develop an economic model combining scaling laws with microeconomic theory to determine profit-optimal LLM training strategies. The model reveals that optimal model size and training expenditure depend on hardware efficiency, data availability, and market adoption thresholds, with current industry trends appearing suboptimal in data-constrained scenarios.
AIBullisharXiv – CS AI · Jun 97/10
🧠Researchers develop a methodology for predicting large language model performance based on compute budgets using prescriptive scaling laws, validated across 7,000 model checkpoints from 2022-2026. The work introduces Proteus-2k, a performance evaluation dataset, and demonstrates that capability boundaries can be reliably estimated with 80% fewer evaluations while maintaining accuracy.
AIBullisharXiv – CS AI · Jun 27/10
🧠Researchers demonstrate that sparse neural networks can improve scaling efficiency in data-limited training scenarios, where models must train multiple epochs on repeated data. The study introduces a scaling law predicting performance across varying sparsity levels (up to 93.75%), finding that moderate sparsity around 50% optimizes loss while higher sparsity improves compute efficiency, challenging assumptions that sparsity is purely an efficiency tool.
AIBullisharXiv – CS AI · May 297/10
🧠Researchers introduce Reasoning in Memory (RiM), a novel method that enables large language models to perform internal reasoning using fixed memory blocks instead of generating intermediate tokens. The approach matches or exceeds existing reasoning methods while being more compute-efficient, as memory blocks process in a single forward pass rather than through autoregressive generation.
AINeutralCrypto Briefing · May 97/10
🧠SpaceX has entered a partnership with Anthropic to enhance AI compute capabilities, potentially reshaping competition with OpenAI. The development highlights growing concerns about tech industry transformation efficiency and the critical importance of model optimization in the AI race.
🏢 OpenAI🏢 Anthropic
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers demonstrate that training loss curves for large language models can collapse onto universal trajectories when hyperparameters are optimally set, enabling more efficient LLM training. They introduce Celerity, a competitive LLM family developed using these insights, and show that deviation from collapse can serve as an early diagnostic for training issues.
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers developed a new approach to quantization-aware training (QAT) that optimizes compute allocation between full-precision and quantized training phases. They discovered that contrary to previous findings, the optimal ratio of QAT to full-precision training increases with total compute budget, and derived scaling laws to predict optimal configurations across different model sizes and bit widths.
AIBullishOpenAI News · May 57/104
🧠A new analysis reveals that compute requirements for training neural networks to match ImageNet classification performance have decreased by 50% every 16 months since 2012. Training a network to AlexNet-level performance now requires 44 times less compute than in 2012, far outpacing Moore's Law improvements which would only yield 11x cost reduction over the same period.
AINeutralarXiv – CS AI · May 276/10
🧠Researchers introduce SDPG, a visual reinforcement learning method that trains robotic control policies significantly faster and more efficiently on consumer GPUs. The approach reduces computational overhead through stochastic gradient estimation while maintaining superior performance, and includes new benchmarks for advancing visual robotics research.
🏢 Nvidia
AIBullisharXiv – CS AI · May 76/10
🧠Researchers propose Predict-then-Diffuse, a framework that optimizes diffusion-based large language models by predicting required response length before generation, reducing computational waste from padding tokens and re-computation overhead while maintaining output quality across multiple datasets.