#model-efficiency News & Analysis

117 articles tagged with #model-efficiency. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

117 articles

AIBullisharXiv – CS AI · Mar 37/103

🧠

Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs

Researchers propose TRIM-KV, a novel approach that learns token importance for memory-bounded LLM inference through lightweight retention gates, addressing the quadratic cost of self-attention and growing key-value cache issues. The method outperforms existing eviction baselines across multiple benchmarks and provides insights into LLM interpretability through learned retention scores.

AIBullisharXiv – CS AI · Feb 277/105

🧠

Ruyi2 Technical Report

Ruyi2 is an adaptive large language model that achieves 2-3x speedup over its predecessor while maintaining comparable performance to Qwen3 models. The model introduces a 'Familial Model' approach using 3D parallel training and establishes a 'Train Once, Deploy Many' paradigm for efficient AI deployment.

AIBullisharXiv – CS AI · Feb 277/109

🧠

Sparse Attention Post-Training for Mechanistic Interpretability

Researchers have developed a post-training method that makes transformer attention 99.6% sparser while maintaining performance, reducing attention connectivity to just 0.4% of edges in models up to 7B parameters. This breakthrough demonstrates that most transformer computation is redundant and enables more interpretable AI models through simplified circuit structures.

AIBullisharXiv – CS AI · Feb 277/107

🧠

Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory

Researchers have developed a unified framework using Spectral Geometry and Random Matrix Theory to address reliability and efficiency challenges in large language models. The study introduces EigenTrack for real-time hallucination detection and RMT-KD for model compression while maintaining accuracy.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation

Researchers introduce LoSATok, a novel audio tokenizer that compresses high-dimensional semantic features into 128-dimensional representations while preserving understanding and generation capabilities. The innovation combines semantic bottleneck compression with dual-level supervision to improve performance for speech, music, and audio generation tasks across diffusion transformer models.

AINeutralarXiv – CS AI · 3d ago5/10

🧠

GraD-IBD: Graph Representation Learning from Diagnosis Trajectories for Early Detection of Inflammatory Bowel Disease

Researchers propose GraD-IBD, a graph-based machine learning model that analyzes patient diagnosis histories encoded in ICD codes to detect inflammatory bowel disease risk earlier and more efficiently than existing sequential models. The approach reformulates longitudinal diagnostic trajectories as temporally directed graphs with a novel message-passing mechanism, demonstrating improved accuracy while reducing computational complexity.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Cross-Entropy Games and Frost Training

Researchers introduce Frost Training, a novel method that applies gradient-based optimization from embedding space to improve LLM policy training on Cross-Entropy Games. The technique leverages signals previously used only in adversarial jailbreaking to accelerate model performance, achieving higher quality outputs faster in Monte Carlo-based optimization tasks.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Apple Intelligence Foundation Language Models

Apple has published research on foundation language models powering Apple Intelligence, including a 3 billion parameter on-device model and a larger server-based model for Private Cloud Compute. The announcement demonstrates Apple's commitment to developing efficient, responsible AI systems that balance performance with privacy.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification

Researchers propose SelfJudge, a new method for accelerating large language model inference through self-supervised judge verification that eliminates the need for human annotations. The approach trains verifiers to assess whether token substitutions preserve semantic meaning, enabling faster inference without sacrificing accuracy across diverse NLP tasks.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

BIRDS: Characterizing and Understanding Biodiversity Impact of Large Language Model Serving

Researchers introduce BIRDS, a framework measuring biodiversity impacts from large language model serving beyond traditional carbon and water metrics. The study reveals that LLM deployment causes ecosystem damage through operational and embodied biodiversity pathways, with impacts scaling significantly across different models, GPUs, and regions.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Researchers introduce Vision-OPD, a self-distillation framework that improves multimodal large language models' ability to detect fine-grained visual details by training full-image models to match the performance of crop-focused models. The technique achieves competitive results against larger models without requiring external teachers, labels, or inference-time tools, addressing a critical weakness in current MLLMs.

AIBullisharXiv – CS AI · 3d ago6/10

🧠

Data-Efficient On-Policy Distillation for Automatic Speech Recognition

Researchers demonstrate that a 0.6B-parameter ASR model trained on 100k hours of speech can achieve competitive performance with larger models through teacher-guided on-policy distillation, reducing the audio data requirements by 99.5% compared to industry standards while closing the capability gap with 1.7B parameter models.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Cross-scale Aligned Supervision for Training GANs

Researchers propose CAT (Cross-scale Aligned Transformer), a new GAN training method that addresses the cross-scale trajectory misalignment problem in multi-stage image generation. By adding consistency regularization between intermediate and final outputs, CAT achieves state-of-the-art results on ImageNet-256 with one-step inference, reaching FID-50K of 1.56 after just 60 training epochs.

AIBullisharXiv – CS AI · 4d ago6/10

🧠

Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs

Researchers propose PIPO (Pair-In, Pair-Out), a novel technique that combines input compression and multi-token prediction to accelerate large language model inference. The method eliminates expensive verification steps while achieving up to 2.64x speedups in first-token latency and demonstrating significant improvements on reasoning benchmarks.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

MetaSICL: Adapting Audiroty LLM via Meta Speech In-Context Learning

Researchers introduce MetaSICL, a post-training method that enhances auditory large language models' ability to learn from in-context demonstrations without fine-tuning. The approach uses high-resource speech data to improve performance on low-resource tasks, outperforming traditional fine-tuning methods when labeled data is scarce or domain-mismatched.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

READER: Reasoning-Enhanced AI-Generated Text Detection

Researchers have developed READER, a compact AI text detector with only 1.5B parameters that outperforms much larger language models and existing detection systems. READER combines classification with explainable reasoning, providing both AI/human verdicts and structured rationales for its decisions, addressing critical limitations in current detection methods that fail under distribution shifts.

🧠 GPT-5🧠 Gemini

AIBullishHugging Face Blog · May 196/10

🧠

OlmoEarth v1.1: A more efficient family of Earth observation models

Allenai has released OlmoEarth v1.1, an improved family of Earth observation models designed for satellite imagery analysis with enhanced efficiency and performance. The update represents progress in open-source geospatial AI, enabling broader access to tools for climate monitoring, disaster response, and environmental analysis.

AIBullishHugging Face Blog · May 146/10

🧠

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

IBM has released Granite Embedding Multilingual R2, an open-source embedding model under Apache 2.0 license supporting 32K context length with multilingual capabilities. The model achieves sub-100M parameter efficiency while delivering retrieval quality competitive with larger models, democratizing access to advanced embeddings for developers and enterprises.

AIBullisharXiv – CS AI · May 126/10

🧠

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

Researchers introduce TAD, a temporal-aware self-distillation framework that improves diffusion large language models' accuracy-parallelism trade-off by using adaptive loss functions based on token decoding timelines. The method increases accuracy from 46.2% to 51.6% while enabling aggressive acceleration modes, addressing a fundamental limitation in parallel text generation.

AIBullisharXiv – CS AI · May 126/10

🧠

CERSA: Cumulative Energy-Retaining Subspace Adaptation for Memory-Efficient Fine-Tuning

Researchers introduce CERSA, a novel parameter-efficient fine-tuning method that uses singular value decomposition to reduce memory consumption while fine-tuning large language models. The technique outperforms existing methods like LoRA by capturing more rank characteristics of weight modifications while requiring substantially less memory for frozen weights.

AINeutralarXiv – CS AI · May 126/10

🧠

SDG-MoE: Signed Debate Graph Mixture-of-Experts

Researchers introduce SDG-MoE, a novel mixture-of-experts architecture that enables deliberation among routed experts through signed graph communication before output aggregation. The model demonstrates 19.8% perplexity improvement over vanilla MoE and achieves state-of-the-art results on multiple language modeling benchmarks while maintaining computational efficiency.

🏢 Perplexity

AINeutralarXiv – CS AI · May 126/10

🧠

DAPE: Dynamic Non-uniform Alignment and Progressive Detail Enhancement Techniques for Improving the Performance of Efficient Visual Language Models

Researchers propose DAPE, a novel framework for visual-language models that uses dynamic, non-uniform alignment between text and image data rather than traditional uniform approaches. The method improves model accuracy across downstream tasks while reducing computational overhead by intelligently matching varying amounts of visual information to text segments based on their information density.

AINeutralarXiv – CS AI · May 126/10

🧠

Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers

Researchers studying one-layer Transformers discovered that architectural choices in feedforward networks (FFNs)—particularly sparse mixture-of-experts (MoE) routing—fundamentally reshape how attention mechanisms learn to compute, with sparsity rather than learned specialization driving this computational redistribution.

AINeutralarXiv – CS AI · May 126/10

🧠

Mixture of Layers with Hybrid Attention

Researchers introduce Mixture of Layers (MoL), a novel architecture that extends Mixture-of-Experts concepts from individual experts to entire transformer blocks, using parallel thin blocks with learned routing. The approach incorporates hybrid attention combining global softmax with linear attention to address token coverage limitations in sparse routing systems.

AIBullisharXiv – CS AI · May 116/10

🧠

LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning

Researchers introduce LiteGUI, a novel training framework that enhances lightweight GUI agents (2B-3B parameters) through reinforcement learning and knowledge distillation, achieving competitive performance with much larger models. The approach addresses key limitations of traditional supervised fine-tuning by incorporating multi-solution learning and dynamic retrieval mechanisms to reduce hallucinations in automated interface interaction tasks.

← PrevPage 3 of 5Next →