y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#transformer-performance News & Analysis

1 article tagged with #transformer-performance. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · 15h ago7/10
🧠

Qrita: High-performance Top-k and Top-p using Pivot-based Truncation and Selection

Researchers introduce Qrita, an efficient algorithm for Top-k and Top-p sampling in large language models that uses pivot-based truncation instead of sorting. The method achieves 1.4x throughput improvements with 50% less memory usage while maintaining identical output to traditional sorting approaches, and has been adopted as the default sampler in vLLM.