AIBullisharXiv – CS AI · Mar 117/10
🧠Researchers have developed Meissa, a lightweight 4B-parameter medical AI model that brings advanced agentic capabilities offline for healthcare applications. The system matches frontier models like GPT in medical benchmarks while operating with 25x fewer parameters and 22x lower latency, addressing privacy and cost concerns in clinical settings.
🧠 Gemini
AINeutralarXiv – CS AI · Mar 97/10
🧠Researchers propose a framework for decentralized resource allocation in real-time AI services across device-edge-cloud infrastructure. The study shows that dependency graph topology determines whether price-based allocation can work at scale, with hierarchical structures enabling stable pricing while complex dependencies cause instability.
AIBullisharXiv – CS AI · Mar 67/10
🧠A research paper presents a 10-year roadmap for coordinated AI and hardware co-development, targeting 1000x efficiency improvements in AI training and inference by 2035. The vision emphasizes energy efficiency over raw compute scaling, proposing integrated solutions across algorithms, architectures, and systems to enable sustainable AI deployment from cloud to edge environments.
AIBullisharXiv – CS AI · Mar 67/10
🧠Researchers developed a memory management system for multi-agent AI systems on edge devices that reduces memory requirements by 4x through 4-bit quantization and eliminates redundant computation by persisting KV caches to disk. The solution reduces time-to-first-token by up to 136x while maintaining minimal impact on model quality across three major language model architectures.
🏢 Perplexity🧠 Llama
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers developed LiteVLA-Edge, a deployment-oriented Vision-Language-Action model pipeline that enables fully on-device inference on embedded robotics hardware like Jetson Orin. The system achieves 150.5ms latency (6.6Hz) through FP32 fine-tuning combined with 4-bit quantization and GPU-accelerated inference, operating entirely offline within a ROS 2 framework.
AINeutralarXiv – CS AI · Mar 57/10
🧠Researchers propose a new framework for Agentic Peer-to-Peer Networks where AI agents on edge devices can collaborate by sharing capabilities and actions rather than static files. The system introduces tiered verification methods to ensure security and reliability when AI agents delegate tasks to untrusted peers in decentralized networks.
AINeutralarXiv – CS AI · Mar 57/10
🧠Researchers propose an Adaptive Quantized Planetary Crater Detection System (AQ-PCDSys) that uses quantized neural networks and multi-sensor fusion to enable real-time AI-powered crater detection on resource-constrained space exploration hardware. The system addresses the critical bottleneck of deploying sophisticated deep learning models on power-limited, radiation-hardened space computers.
AIBullisharXiv – CS AI · Mar 46/102
🧠Researchers developed TinyIceNet, a compact AI model for real-time sea ice mapping using satellite SAR imagery, designed specifically for on-board FPGA processing in space. The system achieves 75.216% F1 score while consuming 50% less energy than GPU baselines, demonstrating practical AI deployment for maritime navigation in polar regions.
$NEAR
AIBullisharXiv – CS AI · Mar 47/102
🧠NeuroSkill is a new open-source AI system that models human mental states in real-time using brain-computer interfaces and biophysical signals. The system runs offline on edge devices and can engage with humans on cognitive and emotional levels through its NeuroLoop harness technology.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers developed NANOMIND, a software-hardware framework that optimizes Large Multimodal Models for battery-powered devices by breaking them into modular components and mapping each to optimal accelerators. The system achieves 42.3% energy reduction and enables 20.8 hours of operation running LLaVA-OneVision on a compact device without network connectivity.
AIBullisharXiv – CS AI · Mar 37/105
🧠Researchers developed HierarchicalPrune, a compression framework that reduces large-scale text-to-image diffusion models' memory footprint by 77.5-80.4% and latency by 27.9-38.0% while maintaining image quality. The technique enables billion-parameter AI models to run efficiently on resource-constrained devices through hierarchical pruning and knowledge distillation.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers propose ROMA, a new hardware accelerator for running large language models on edge devices using QLoRA. The system uses ROM storage for quantized base models and SRAM for LoRA weights, achieving over 20,000 tokens/s generation speed without external memory.
AIBullisharXiv – CS AI · Mar 37/102
🧠ButterflyMoE introduces a breakthrough approach to reduce memory requirements for AI expert models by 150× through geometric parameterization instead of storing independent weight matrices. The method uses shared ternary prototypes with learned rotations to achieve sub-linear memory scaling, enabling deployment of multiple experts on edge devices.
AIBullisharXiv – CS AI · Mar 37/103
🧠CSRv2 introduces a new training approach for ultra-sparse embeddings that reduces inactive neurons from 80% to 20% while delivering 14% accuracy gains. The method achieves 7x speedup over existing approaches and up to 300x improvements in compute and memory efficiency compared to dense embeddings.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers propose 'Intelligence per Watt' (IPW) as a metric to measure AI efficiency, finding that local AI models can handle 71.3% of queries while being 1.4x more energy efficient than cloud alternatives. The study demonstrates that smaller local language models (≤20B parameters) can redistribute computational demand from centralized cloud infrastructure.
AIBullisharXiv – CS AI · Feb 277/108
🧠Researchers introduce UniQL, a unified framework for quantizing and compressing large language models to run efficiently on mobile devices. The system achieves 4x-5.7x memory reduction and 2.7x-3.4x speed improvements while maintaining accuracy within 5% of original models.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers developed a runtime-reconfigurable bitwise systolic array architecture for multi-precision quantized neural networks on FPGA hardware accelerators. The system achieves 1.3-3.6x speedup on mixed-precision models while supporting higher clock frequencies up to 250MHz, addressing the trade-off between hardware efficiency and inference accuracy.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers developed TT-SEAL, a selective encryption framework for compressed AI models using Tensor-Train Decomposition that maintains security while encrypting only 4.89-15.92% of parameters. The system achieves the same robustness as full encryption while reducing AES decryption overhead in end-to-end latency from 58% to as low as 2.76%.
AIBullishIEEE Spectrum – AI · Feb 97/105
🧠Researchers at UC San Diego developed a new type of bulk resistive RAM (RRAM) that overcomes traditional limitations by switching entire layers rather than forming filaments. The technology achieved 90% accuracy in AI learning tasks and could enable more efficient edge computing by allowing computation within memory itself.
AIBullishGoogle DeepMind Blog · May 207/105
🧠Google announces Gemma 3n preview, a new open-source AI model optimized for mobile devices with multimodal capabilities including audio processing. The model features a unique 2-in-1 architecture designed to enable fast, interactive AI applications directly on devices.
AIBullishHugging Face Blog · Mar 77/108
🧠The article provides a guide for running Large Language Models (LLMs) directly on mobile devices using React Native, enabling edge inference capabilities. This development represents a significant step toward decentralized AI processing, reducing reliance on cloud-based services and improving privacy and latency for mobile AI applications.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers present a systematic analysis of hybrid multi-agent systems combining cloud-based large language models with on-device small language models, revealing that optimal architecture design is highly task-dependent and that increased frontier compute does not guarantee better performance across the power-cost-accuracy Pareto frontier.
AIBullisharXiv – CS AI · 2d ago6/10
🧠Researchers introduce UI-KOBE, a framework that enhances lightweight mobile GUI agents by combining them with app-specific knowledge graphs to enable more reliable task automation on mobile devices. This approach reduces dependency on large vision-language models, lowering inference costs and improving privacy by enabling on-device deployment without sacrificing performance.
AIBullishDecrypt – AI · 2d ago6/10
🧠A London startup successfully compressed 4.1 million recipes across seven languages into a 2-megabyte AI model, demonstrating dramatic efficiency gains in machine learning. This achievement highlights how modern compression techniques and optimized neural architectures enable powerful AI systems to run on minimal computational resources.
AIBullisharXiv – CS AI · 3d ago6/10
🧠Researchers introduce DAROM, a reinforcement learning framework designed to handle stochastic communication delays in autonomous vehicle highway merging scenarios. The system uses a delay-aware encoder to maintain decision-making performance despite V2I transmission latencies up to 2.0 seconds, achieving over 99% success rates in high-density traffic conditions.