8 articles tagged with #token-compression. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Apr 77/10
๐ง Researchers developed LightThinker++, a new framework that enables large language models to compress intermediate reasoning thoughts and manage memory more efficiently. The system reduces peak token usage by up to 70% while improving accuracy by 2.42% and maintaining performance over extended reasoning tasks.
AIBullisharXiv โ CS AI ยท Apr 66/10
๐ง Researchers introduce Image Prompt Packaging (IPPg), a technique that embeds text directly into images to reduce multimodal AI inference costs by 35.8-91.0% while maintaining competitive accuracy. The method shows significant promise for cost optimization in large multimodal language models, though effectiveness varies by model and task type.
๐ง GPT-4๐ง Claude
AIBullisharXiv โ CS AI ยท Mar 276/10
๐ง Photon is a new framework that efficiently processes 3D medical imaging for AI visual question answering by using variable-length token sequences and adaptive compression. The system reduces computational costs while maintaining accuracy through instruction-conditioned token scheduling and custom gradient propagation techniques.
AIBullisharXiv โ CS AI ยท Mar 166/10
๐ง Researchers developed a structured distillation method that compresses AI agent conversation history by 11x (from 371 to 38 tokens per exchange) while maintaining 96% of retrieval quality. The technique enables thousands of exchanges to fit within a single prompt at 1/11th the context cost, addressing the expensive verbatim storage problem for long AI conversations.
AIBullisharXiv โ CS AI ยท Mar 166/10
๐ง Researchers introduce Cheers, a unified multimodal AI model that combines visual comprehension and generation by decoupling patch details from semantic representations. The model achieves 4x token compression and outperforms existing models like Tar-1.5B while using only 20% of the training cost.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers propose TC-SSA, a token compression framework that enables large vision-language models to process gigapixel pathology images by reducing visual tokens to 1.7% of original size while maintaining diagnostic accuracy. The method achieves 78.34% overall accuracy on SlideBench and demonstrates strong performance across multiple cancer classification tasks.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers developed CaCoVID, a reinforcement learning-based algorithm that compresses video tokens for large language models by selecting tokens based on their actual contribution to correct predictions rather than attention scores. The method uses combinatorial policy optimization to reduce computational overhead while maintaining video understanding performance.
AINeutralarXiv โ CS AI ยท Mar 34/104
๐ง Researchers introduce EfficientPosterGen, an AI framework that automatically converts research papers into academic posters using semantic-aware retrieval and token compression techniques. The system addresses key limitations of existing multimodal language models by reducing token consumption while maintaining high-quality poster generation through innovative visual-based context compression and deterministic layout violation detection.