133 articles tagged with #computational-efficiency. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท 1d ago7/10
๐ง Researchers introduce Criticality-Aware Adversarial Training (CAAT), a parameter-efficient method that identifies and fine-tunes only the most robustness-critical parameters in Vision Transformers, achieving 94.3% of standard adversarial training robustness while tuning just 6% of model parameters. This breakthrough addresses the computational bottleneck preventing large-scale adversarial training deployment.
AIBullisharXiv โ CS AI ยท 1d ago7/10
๐ง Researchers present Chain-of-Models Pre-Training (CoM-PT), a novel method that accelerates vision foundation model training by up to 7.09X through sequential knowledge transfer from smaller to larger models in a unified pipeline, rather than training each model independently. The approach maintains or improves performance while significantly reducing computational costs, with efficiency gains increasing as more models are added to the training sequence.
AIBullisharXiv โ CS AI ยท 2d ago7/10
๐ง Researchers introduce SpecMoE, a new inference system that applies speculative decoding to Mixture-of-Experts language models to improve computational efficiency. The approach achieves up to 4.30x throughput improvements while reducing memory and bandwidth requirements without requiring model retraining.
AIBullisharXiv โ CS AI ยท 2d ago7/10
๐ง Researchers introduce Deep Optimizer States, a technique that reduces GPU memory constraints during large language model training by dynamically offloading optimizer state between host and GPU memory during computation cycles. The method achieves 2.5ร faster iterations compared to existing approaches by better managing the memory fluctuations inherent in transformer training pipelines.
AIBullisharXiv โ CS AI ยท 2d ago7/10
๐ง Researchers introduce RL^V, a reinforcement learning method that unifies LLM reasoners with generative verifiers to improve test-time compute scaling. The approach achieves over 20% accuracy gains on MATH benchmarks and enables 8-32x more efficient test-time scaling compared to existing RL methods by preserving and leveraging learned value functions.
AIBullisharXiv โ CS AI ยท 2d ago7/10
๐ง Researchers present MoEITS, a novel algorithm for simplifying Mixture-of-Experts large language models while maintaining performance and reducing computational costs. The method outperforms existing pruning techniques across multiple benchmark models including Mixtral 8ร7B and DeepSeek-V2-Lite, addressing the energy and resource efficiency challenges of deploying advanced LLMs.
AIBullisharXiv โ CS AI ยท 2d ago7/10
๐ง Researchers introduce PnP-CM, a new method that reformulates consistency models as proximal operators within plug-and-play frameworks for solving inverse problems. The approach achieves high-quality image reconstructions with minimal neural function evaluations (4 NFEs), demonstrating practical efficiency gains over existing consistency model solvers and marking the first application of CMs to MRI data.
AIBullisharXiv โ CS AI ยท 3d ago7/10
๐ง Researchers propose a cost-effective proxy model framework that uses smaller, efficient models to approximate the interpretability explanations of expensive Large Language Models (LLMs), achieving over 90% fidelity at just 11% of computational cost. The framework includes verification mechanisms and demonstrates practical applications in prompt compression and data cleaning, making interpretability tools viable for real-world LLM development.
AIBullisharXiv โ CS AI ยท 6d ago7/10
๐ง Researchers demonstrate that large speech language models contain significant redundancy in their token representations, particularly in deeper layers. By introducing Affinity Pooling, a training-free token merging technique, they achieve 27.48% reduction in prefilling FLOPs and up to 1.7ร memory savings while maintaining semantic accuracy, challenging the necessity of fully distinct tokens for acoustic processing.
AIBullisharXiv โ CS AI ยท 6d ago7/10
๐ง Q-Zoom is a new framework that improves the efficiency of multimodal large language models by intelligently processing high-resolution visual inputs. Using adaptive query-aware perception, the system achieves 2.5-4.4x faster inference speeds on document and high-resolution tasks while maintaining or exceeding baseline accuracy across multiple MLLM architectures.
AIBullisharXiv โ CS AI ยท 6d ago7/10
๐ง Researchers introduce SPICE, a data selection algorithm that reduces large language model training data requirements by 90% while maintaining performance by identifying and minimizing gradient conflicts between training samples. The method combines information-theoretic principles with practical efficiency improvements, enabling effective model tuning on just 10% of typical datasets across multiple benchmarks.
AIBullisharXiv โ CS AI ยท Apr 77/10
๐ง Researchers introduce k-Maximum Inner Product (k-MIP) attention for graph transformers, enabling linear memory complexity and up to 10x speedups while maintaining full expressive power. The innovation allows processing of graphs with over 500k nodes on a single GPU and demonstrates top performance on benchmark datasets.
AIBullisharXiv โ CS AI ยท Mar 277/10
๐ง Researchers propose HIVE, a new framework for training large language models more efficiently in reinforcement learning by selecting high-utility prompts before rollout. The method uses historical reward data and prompt entropy to identify the 'learning edge' where models learn most effectively, significantly reducing computational overhead without performance loss.
AINeutralarXiv โ CS AI ยท Mar 267/10
๐ง Researchers propose DIG, a training-free framework that improves long-form video understanding by adapting frame selection strategies based on query types. The system uses uniform sampling for global queries and specialized selection for localized queries, achieving better performance than existing methods while scaling to 256 input frames.
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers introduce EcoAlign, a new framework for aligning Large Vision-Language Models that treats alignment as an economic optimization problem. The method balances safety, utility, and computational costs while preventing harmful reasoning disguised with benign justifications, showing superior performance across multiple models and datasets.
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers have discovered that large AI models develop decomposable internal structures during training, with many parameter dependencies remaining statistically unchanged from initialization. They propose a post-training method to identify and remove unsupported dependencies, enabling parallel inference without modifying model functionality.
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers introduce AgentDiet, a trajectory reduction technique that cuts computational costs for LLM-based agents by 39.9%-59.7% in input tokens and 21.1%-35.9% in total costs while maintaining performance. The approach removes redundant and expired information from agent execution trajectories during inference time.
AINeutralarXiv โ CS AI ยท Mar 177/10
๐ง Researchers developed Prefix-Shared KV Cache (PSKV), a new technique that accelerates jailbreak attacks on Large Language Models by 40% while reducing memory usage by 50%. The method optimizes the red-teaming process by sharing cached prefixes across multiple attack attempts, enabling more efficient parallel inference without compromising attack success rates.
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers developed SToRM, a new framework that reduces computational costs for autonomous driving systems using multi-modal large language models by up to 30x while maintaining performance. The system uses supervised token reduction techniques to enable real-time end-to-end driving on standard GPUs without sacrificing safety or accuracy.
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers introduce Mixture-of-Depths Attention (MoDA), a new mechanism for large language models that allows attention heads to access key-value pairs from both current and preceding layers to combat signal degradation in deeper models. Testing on 1.5B-parameter models shows MoDA improves perplexity by 0.2 and downstream task performance by 2.11% with only 3.7% computational overhead while maintaining 97.3% of FlashAttention-2's efficiency.
๐ข Perplexity
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers introduce AutoTool, a new reinforcement learning approach that enables AI agents to automatically scale their reasoning capabilities for tool use. The method uses entropy-based optimization and supervised fine-tuning to help models efficiently determine appropriate thinking lengths for simple versus complex problems, achieving 9.8% accuracy improvements while reducing computational overhead by 81%.
AIBullisharXiv โ CS AI ยท Mar 167/10
๐ง Researchers propose Budget-Aware Value Tree (BAVT), a training-free framework that improves LLM agent efficiency by intelligently managing computational resources during multi-hop reasoning tasks. The system outperforms traditional approaches while using 4x fewer resources, demonstrating that smart budget management beats brute-force compute scaling.
AIBullisharXiv โ CS AI ยท Mar 167/10
๐ง Researchers propose ReBalance, a training-free framework that optimizes Large Reasoning Models by addressing overthinking and underthinking issues through confidence-based guidance. The solution dynamically adjusts reasoning trajectories without requiring model retraining, showing improved accuracy across multiple AI benchmarks.
AIBullisharXiv โ CS AI ยท Mar 117/10
๐ง Researchers propose SEER (Self-Enhancing Efficient Reasoning), a framework that compresses Chain-of-Thought reasoning in Large Language Models while maintaining accuracy. The study found that longer reasoning chains don't always improve performance and can increase latency by up to 5x, leading to a 42.1% reduction in CoT length while improving accuracy.
AIBullisharXiv โ CS AI ยท Mar 97/10
๐ง Researchers propose Traversal-as-Policy, a method that distills AI agent execution logs into Gated Behavior Trees (GBTs) to create safer, more efficient autonomous agents. The approach significantly improves success rates while reducing safety violations and computational costs across multiple benchmarks.