AIBullisharXiv – CS AI · Jun 97/10
🧠Researchers introduce RAPID, a depth-aware token reduction framework for Vision Transformers that uses different pruning and merging strategies across network layers to reduce computational costs while maintaining accuracy. The method achieves superior performance compared to existing approaches like ToMe, with up to 4.29% higher accuracy in aggressive compression scenarios.
AIBullisharXiv – CS AI · Jun 97/10
🧠Researchers introduce I-Segmenter, the first fully integer-only Vision Transformer framework for semantic segmentation that eliminates floating-point operations to enable efficient deployment on resource-constrained devices. The model achieves only 5.1% accuracy loss compared to standard floating-point versions while reducing model size by 3.8x and improving inference speed by 1.2x, with a novel activation function addressing quantization challenges.
AIBullisharXiv – CS AI · Jun 57/10
🧠Researchers introduce LLMCodec, a novel compression method that adapts video codecs like VVC/H.266 to efficiently compress large language models. The approach achieves significant improvements over existing quantization methods, reducing perplexity by 1.5x on LLaMA-3-8B at 2-bit precision while improving downstream task accuracy by 21%.
🏢 Perplexity
AIBullisharXiv – CS AI · Jun 57/10
🧠Researchers introduce Drive-KD, a knowledge distillation framework that compresses large vision-language models for autonomous driving by decomposing the task into perception, reasoning, and planning components. The method achieves superior performance with 42x less GPU memory and 11.4x higher throughput compared to larger baseline models, advancing the practical deployment of AI in safety-critical driving systems.
🧠 GPT-5
AIBullisharXiv – CS AI · Jun 27/10
🧠FreqLite is a new lightweight linear model for long-term time-series forecasting that uses frequency decomposition and adaptive normalization to achieve better accuracy than larger transformer models while requiring 4x fewer parameters and significantly less computational resources. The method introduces Adaptive Reversible Instance Normalization (A-RevIN) to handle non-stationary data more effectively than existing approaches.
AIBullisharXiv – CS AI · Jun 27/10
🧠Researchers propose ASKD-Whisper, a new knowledge distillation technique that compresses OpenAI's Whisper speech recognition model while improving performance. The method achieves 5x faster inference and 1.07% lower error rates than the original teacher model by dynamically reducing reliance on the teacher's predictions during training.
AIBullisharXiv – CS AI · Jun 27/10
🧠Researchers present AVIC, an adaptive framework that optimizes when and how much multimodal language models should use world models for visual imagination during spatial reasoning tasks. The system learns to selectively invoke visual imagination only when necessary, reducing computational costs while matching or exceeding performance of fixed imagination strategies and proprietary baselines like GPT-4o.
🧠 GPT-4
AIBullisharXiv – CS AI · Jun 27/10
🧠Researchers introduce DyLLM, a training-free inference framework that accelerates diffusion language model decoding by up to 9.6x by selectively computing only salient tokens rather than processing entire sequences at each step. The approach identifies important tokens through attention context similarity and reuses cached activations for stable tokens, maintaining baseline accuracy across benchmarks.
AIBullisharXiv – CS AI · May 297/10
🧠Pocket-Dentist presents an efficiency-aware benchmark for dental image analysis using compact multimodal vision-language models, demonstrating that smaller 2B-parameter models outperform larger counterparts while consuming significantly fewer computational resources. Successfully deployed on iPhone hardware, the approach enables privacy-preserving dental prescreening outside specialist centers with practical latency and memory constraints.
AIBullisharXiv – CS AI · May 277/10
🧠MiniMax introduces the M2 series, a Mixture-of-Experts language model with 229.9B total parameters but only 9.8B activated per token, achieving frontier-tier performance on agentic tasks through agent-driven data pipelines and a custom reinforcement learning system called Forge. The M2.7 checkpoint demonstrates early self-evolution capabilities, autonomously debugging and modifying its own training scaffold.
AIBullisharXiv – CS AI · May 277/10
🧠Researchers introduce JetViT, a hybrid Vision Transformer architecture that maintains accuracy of state-of-the-art models while delivering up to 1.79x faster throughput and 44.81% lower latency on high-resolution images. The innovation uses post-training attention search to convert full-attention models into efficient hybrid variants by strategically replacing redundant attention blocks.
🏢 Nvidia
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce AHD Agent, a reinforcement learning framework that enables language models to autonomously design heuristics for solving complex combinatorial optimization problems. A 4-billion-parameter model achieves performance comparable to much larger systems while requiring significantly fewer computational evaluations, advancing the frontier of AI-driven algorithm design.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce BaLoRA, a Bayesian extension of Low-Rank Adaptation that improves fine-tuning of large AI models by adding uncertainty quantification while narrowing the accuracy gap with full fine-tuning. The method uses input-adaptive parameterization with minimal computational overhead and demonstrates stronger performance across language, vision, and materials science tasks.
AIBullisharXiv – CS AI · May 127/10
🧠Zyphra has released ZAYA1-VL-8B, a compact mixture-of-experts vision-language model that delivers competitive performance with larger systems while using significantly fewer active parameters. The model introduces vision-specific LoRA adapters and bidirectional attention mechanisms to enhance visual understanding, representing meaningful progress in efficient AI model design.
🏢 Hugging Face
AIBullisharXiv – CS AI · May 97/10
🧠Zyphra has unveiled ZAYA1-8B, a compact reasoning-focused AI model with only 700M active parameters that matches larger competitors like DeepSeek-R1 on mathematics and coding tasks. The model introduces Markovian RSA, a novel test-time compute method that achieves 91.9% on AIME'25 benchmarks while maintaining computational efficiency, suggesting small models can compete with much larger reasoning systems through architectural innovation.
🧠 GPT-5🧠 Gemini
AINeutralarXiv – CS AI · Jun 56/10
🧠EGTR-Review presents a novel framework for automating scientific peer review using a multi-agent teacher model that distills its reasoning into a lightweight student model, achieving superior performance with significantly lower computational costs while maintaining evidence traceability and factual grounding.
AINeutralarXiv – CS AI · Jun 46/10
🧠Researchers propose an enhanced medical image segmentation framework by integrating a lightweight Box Predictor module into MedSAM, which estimates bounding boxes from single user clicks to improve segmentation accuracy across CT, MRI, and ultrasound imaging. The method adds minimal computational overhead (1.6M parameters) while achieving strong Dice scores across four diverse medical imaging datasets.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers propose a lightweight temporal convolutional network enhanced with physics-guided attention mechanisms for WiFi-based human activity recognition. The approach uses Doppler-energy and variance-driven attention to capture motion dynamics more efficiently than deep learning baselines, achieving better performance with fewer parameters.
AIBullisharXiv – CS AI · Jun 26/10
🧠M-DESIGN, a new retrieval-augmented framework, addresses the inefficiency gap between expensive neural architecture search and suboptimal model retrieval by dynamically leveraging historical evidence from prior tasks to discover near-optimal network modifications. Tested on 67,760 graph neural networks across 22 datasets, the method achieves state-of-the-art performance in 79% of cases under computational constraints.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose Compressed Video Aggregator (CVA), a lightweight module that improves micro-video recommendation systems by decoupling video processing from preference learning. The method reduces training time and GPU memory by orders of magnitude while maintaining or improving performance through intelligent frame selection based on video titles.
AIBullisharXiv – CS AI · May 96/10
🧠Researchers introduce NOVA, a world modeling framework that represents scene state as weights in implicit neural representations (INRs) rather than traditional encoded latent spaces. The approach eliminates decoder bottlenecks, achieves structural disentanglement of scene components, and enables controllable video generation on consumer GPUs with only 40M parameters.
AIBullisharXiv – CS AI · May 96/10
🧠Researchers introduce UniSD, a unified self-distillation framework that systematically improves large language model adaptation without requiring external teacher models. The framework combines multiple complementary mechanisms and demonstrates consistent performance gains of +5.4 points over baseline models across six benchmarks, advancing efficient LLM training techniques.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers introduce Budgeted LoRA, a distillation framework that compresses large language models by treating model compression as a structured compute allocation problem. The method achieves up to 4.05x speedup in inference through selective dense component removal and adaptive low-rank allocation, controlled by a single compute budget parameter.
🏢 Perplexity
AIBullisharXiv – CS AI · Apr 66/10
🧠Researchers have developed Efficient3D, a framework that accelerates 3D Multimodal Large Language Models (MLLMs) while maintaining accuracy through adaptive token pruning. The system uses a Debiased Visual Token Importance Estimator and Adaptive Token Rebalancing to reduce computational overhead without sacrificing performance, showing +2.57% CIDEr improvement on benchmarks.
AIBullishHugging Face Blog · Jun 36/106
🧠SmolVLA is a new efficient vision-language-action model that has been trained using data from the Lerobot community. This represents an advancement in AI models that can process visual and language inputs to generate actions, potentially improving robotic and automation applications.