#throughput News & Analysis

25 articles tagged with #throughput. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

25 articles

CryptoBullishBlockonomi · Jun 237/10

⛓️

Crypto Update: Cardano Launches Leios Testnet With 60x Throughput as ETH Holds $1,656 and Pepeto Tops $10.3M

Cardano launched its Leios testnet on June 23, featuring a consensus protocol designed to increase base-layer throughput by up to 60x. The development arrives as Ethereum trades at $1,656 and the broader crypto market experiences mixed momentum, with emerging projects like Pepeto raising over $10.3 million.

$ETH$ADA

CryptoBullishBitcoinist · Jun 237/10

⛓️

Cardano Leios Testnet Puts ADA Scalability Debate Back In Focus

Cardano's Leios testnet launch has renewed attention on ADA's scalability roadmap, with the project pursuing increased transaction throughput while maintaining security and decentralization. The initiative represents a critical technical milestone in addressing long-standing concerns about Cardano's transaction capacity relative to competing blockchains.

$ADA

CryptoBullishU.Today · Jun 227/10

⛓️

Big Week for Cardano? Major Scaling Upgrade Eyes Testnet Release

Cardano is preparing to test a major scaling upgrade on its testnet, marking a significant milestone in the blockchain's efforts to improve transaction throughput and network capacity. This development is critical for Cardano's competitive positioning against other layer-1 platforms and directly addresses long-standing concerns about scalability.

$ADA

CryptoBullishCrypto Briefing · Jun 207/10

⛓️

Solana chain transactions more than double since January 1st

Solana's transaction volume has more than doubled since January 1st, demonstrating significant network growth and reinforcing its position as a high-throughput settlement layer. However, the surge raises ongoing questions about whether transaction growth translates into SOL token value appreciation and sustainable network monetization.

$SOL

CryptoBullishNewsBTC · Jun 197/10

⛓️

Ethereum Glamsterdam Upgrade Moves Toward 200M Gas Limit Roadmap

Ethereum's Glamsterdam upgrade is progressing through devnet testing with ePBS (encrypted proposer-builder separation) and block-level access lists as core components aimed at increasing the network's gas limit to 200 million. This upgrade represents a significant scaling initiative that could substantially improve Ethereum's transaction throughput and network capacity.

$ETH

AIBullisharXiv – CS AI · Jun 97/10

🧠

Enabling KV Caching of Shared Prefix for Diffusion Language Models

Researchers introduce bicache, a novel KV caching technique that enables efficient serving of diffusion language models (DLMs) with shared prefixes. Unlike traditional LLMs, DLMs use bidirectional attention, which invalidates conventional caching methods and causes accuracy collapse. Bicache dynamically identifies safe layer depths for prefix reuse, achieving 36-98% throughput improvements.

CryptoBullishCrypto Briefing · May 297/10

⛓️

SEI unveils Giga upgrade roadmap, targets 200,000 TPS and 400ms finality

Sei has announced its Giga upgrade roadmap, targeting 200,000 transactions per second (TPS) and 400 millisecond finality, positioning itself as a high-performance blockchain solution for DeFi and high-frequency trading. This upgrade represents a significant scaling advancement that could reshape how blockchain networks handle demanding applications requiring speed and throughput.

AIBullisharXiv – CS AI · May 287/10

🧠

A Policy-Driven Runtime Layer for Agentic LLM Serving

Researchers propose a new runtime layer architecture for serving multi-agent LLM systems, positioned between application frameworks and inference engines. The approach enables unified policy management for cross-cutting concerns like caching and fairness, with CacheSage demonstrating 13-37% improvements in cache hit rates and 12-29% reductions in time-to-first-token latency.

AIBullisharXiv – CS AI · May 287/10

🧠

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

Researchers present a systematic study of Attention-FFN Disaggregation (AFD), a technique that separates attention and expert layers across different GPU groups to optimize inference serving for Mixture-of-Experts language models. The framework demonstrates that AFD enables 4k tokens/s throughput on DeepSeek-V3.2 under strict latency constraints where traditional disaggregation approaches fail, providing design principles for scaling LLM infrastructure.

AIBullisharXiv – CS AI · May 277/10

🧠

HiSpec: Hierarchical Speculative Decoding for LLMs

Researchers introduce HiSpec, a hierarchical speculative decoding framework that accelerates large language model inference by using early-exit models for intermediate verification, achieving up to 2.01× throughput improvements without sacrificing accuracy.

CryptoBullishU.Today · May 227/10

⛓️

'Zcash Is About to Get Much Faster': 3 Key Upgrades Driving 300% Speed Boost

Zcash has deployed its NU7 testnet upgrade, achieving a 75% reduction in block times that triples overall network speed. This significant performance enhancement addresses scalability concerns and positions the privacy-focused blockchain to compete more effectively with faster layer-1 networks.

AIBullisharXiv – CS AI · May 127/10

🧠

SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference

SynerDiff is a new continuous batching system for diffusion model inference that addresses resource contention issues between UNet and VAE components. The system achieves 1.6× throughput improvement and up to 78.7% latency reduction through intra-level and inter-level optimization strategies, enabling faster AI-generated content services.

AIBullisharXiv – CS AI · May 117/10

🧠

Regulating Branch Parallelism in LLM Serving

Researchers introduce TAPER, an admission controller for managing parallel branch execution in LLM serving systems. The system dynamically regulates how many concurrent decoding branches are allowed per request step, balancing throughput gains against degradation to co-batched requests, achieving 1.77x improvement in goodput over conservative baselines.

AIBullisharXiv – CS AI · Mar 267/10

🧠

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

Researchers developed ODMA, a new memory allocation strategy that improves Large Language Model serving performance on memory-constrained accelerators by up to 27%. The technique addresses bandwidth limitations in LPDDR systems through adaptive bucket partitioning and dynamic generation-length prediction.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study

Researchers conducted comprehensive benchmarks of LLM inference on AMD Instinct MI325X GPUs, testing models from 235B to 1 trillion parameters. The study reveals that architecture-aware optimization is critical, with different model types requiring specific configurations for optimal performance on AMD hardware.

🧠 Llama

AIBullishMarkTechPost · Mar 117/10

🧠

NVIDIA Releases Nemotron 3 Super: A 120B Parameter Open-Source Hybrid Mamba-Attention MoE Model Delivering 5x Higher Throughput for Agentic AI

NVIDIA has released Nemotron 3 Super, a 120 billion parameter open-source AI model designed for multi-agent applications. The hybrid Mamba-Attention MoE model delivers 5x higher throughput and bridges the gap between proprietary frontier models and transparent open-source alternatives.

🏢 Nvidia

AIBullisharXiv – CS AI · Mar 67/10

🧠

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Researchers introduce AMV-L, a new memory management framework for long-running LLM systems that uses utility-based lifecycle management instead of traditional time-based retention. The system improves throughput by 3.1x and reduces latency by up to 4.7x while maintaining retrieval quality by controlling memory working-set size rather than just retention time.

AIBullisharXiv – CS AI · Mar 47/102

🧠

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

Researchers propose SUN (Shared Use of Next-token Prediction), a novel approach for multi-LLM serving that enables cross-model sharing of decode execution by decomposing transformers into separate prefill and decode modules. The system achieves up to 2.0x throughput improvement per GPU while maintaining accuracy comparable to full fine-tuning, with a quantized version (QSUN) providing additional 45% speedup.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving

Nightjar is a new adaptive speculative decoding framework for large language models that dynamically adjusts to system load conditions. It achieves 27.29% higher throughput and up to 20.18% lower latency by intelligently enabling or disabling speculation based on workload demands.

AI × CryptoBullishBlockonomi · Jun 216/10

🤖

Sui Claims 1M Ops Per Second, and AI Agents Noticed First

Sui blockchain announced achieving 1 million operations per second, with AI agents emerging as early adopters of the platform's high throughput capabilities. The milestone sparked broader discussions about blockchain scalability and competitive positioning among high-performance networks, while SUI token gained 0.79% in 24-hour trading.

$SUI

AIBullisharXiv – CS AI · Jun 196/10

🧠

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

Researchers introduce UltraQuant, a 4-bit key-value cache compression technique optimized for long-context AI agents that need to process multiple conversation turns efficiently. The method achieves 3.47x faster response times in cache-pressured scenarios and 1.63x higher throughput compared to standard FP8 approaches, with practical optimizations for AMD GPU deployment.

CryptoNeutralCoinDesk · Apr 306/10

⛓️

Crypto for Advisors: Breaking down the Sui blockchain

Sui is a Layer-1 blockchain featuring object-based architecture and parallel execution capabilities designed to deliver high throughput for consumer-focused Web3 applications. The platform differentiates itself through technical innovations that address scalability constraints common to earlier blockchain generations.

AIBullisharXiv – CS AI · Mar 36/104

🧠

OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration

OrbitFlow is a new KV cache management system for long-context LLM serving that uses adaptive memory allocation and fine-grained optimization to improve performance. The system achieves up to 66% better SLO attainment and 3.3x higher throughput by dynamically managing GPU memory usage during token generation.

AIBullisharXiv – CS AI · Mar 26/1017

🧠

Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving

Researchers developed a data-driven pipeline to optimize GPU efficiency for distributed LLM adapter serving, achieving sub-5% throughput estimation error while running 90x faster than full benchmarking. The system uses a Digital Twin, machine learning models, and greedy placement algorithms to minimize GPU requirements while serving hundreds of adapters concurrently.

AINeutralGoogle Research Blog · Feb 113/107

🧠

Scheduling in a changing world: Maximizing throughput with time-varying capacity

This appears to be a research article focused on algorithmic optimization for scheduling systems with time-varying capacity constraints. The work addresses theoretical approaches to maximizing throughput in dynamic environments where system capacity changes over time.