y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#edge-computing News & Analysis

131 articles tagged with #edge-computing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

131 articles
AIBullisharXiv – CS AI · Mar 117/10
🧠

Meissa: Multi-modal Medical Agentic Intelligence

Researchers have developed Meissa, a lightweight 4B-parameter medical AI model that brings advanced agentic capabilities offline for healthcare applications. The system matches frontier models like GPT in medical benchmarks while operating with 25x fewer parameters and 22x lower latency, addressing privacy and cost concerns in clinical settings.

🧠 Gemini
AINeutralarXiv – CS AI · Mar 97/10
🧠

Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum

Researchers propose a framework for decentralized resource allocation in real-time AI services across device-edge-cloud infrastructure. The study shows that dependency graph topology determines whether price-based allocation can work at scale, with hierarchical structures enabling stable pricing while complex dependencies cause instability.

AIBullisharXiv – CS AI · Mar 67/10
🧠

AI+HW 2035: Shaping the Next Decade

A research paper presents a 10-year roadmap for coordinated AI and hardware co-development, targeting 1000x efficiency improvements in AI training and inference by 2035. The vision emphasizes energy efficiency over raw compute scaling, proposing integrated solutions across algorithms, architectures, and systems to enable sustainable AI deployment from cloud to edge environments.

AIBullisharXiv – CS AI · Mar 67/10
🧠

Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices

Researchers developed a memory management system for multi-agent AI systems on edge devices that reduces memory requirements by 4x through 4-bit quantization and eliminates redundant computation by persisting KV caches to disk. The solution reduces time-to-first-token by up to 136x while maintaining minimal impact on model quality across three major language model architectures.

🏢 Perplexity🧠 Llama
AIBullisharXiv – CS AI · Mar 56/10
🧠

LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics

Researchers developed LiteVLA-Edge, a deployment-oriented Vision-Language-Action model pipeline that enables fully on-device inference on embedded robotics hardware like Jetson Orin. The system achieves 150.5ms latency (6.6Hz) through FP32 fine-tuning combined with 4-bit quantization and GPU-accelerated inference, operating entirely offline within a ROS 2 framework.

AINeutralarXiv – CS AI · Mar 57/10
🧠

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing

Researchers propose a new framework for Agentic Peer-to-Peer Networks where AI agents on edge devices can collaborate by sharing capabilities and actions rather than static files. The system introduces tiered verification methods to ensure security and reliability when AI agents delegate tasks to untrusted peers in decentralized networks.

AINeutralarXiv – CS AI · Mar 57/10
🧠

Adaptive Quantized Planetary Crater Detection System for Autonomous Space Exploration

Researchers propose an Adaptive Quantized Planetary Crater Detection System (AQ-PCDSys) that uses quantized neural networks and multi-sensor fusion to enable real-time AI-powered crater detection on resource-constrained space exploration hardware. The system addresses the critical bottleneck of deploying sophisticated deep learning models on power-limited, radiation-hardened space computers.

AIBullisharXiv – CS AI · Mar 46/102
🧠

TinyIceNet: Low-Power SAR Sea Ice Segmentation for On-Board FPGA Inference

Researchers developed TinyIceNet, a compact AI model for real-time sea ice mapping using satellite SAR imagery, designed specifically for on-board FPGA processing in space. The system achieves 75.216% F1 score while consuming 50% less energy than GPU baselines, demonstrating practical AI deployment for maritime navigation in polar regions.

$NEAR
AIBullisharXiv – CS AI · Mar 37/104
🧠

Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices

Researchers developed NANOMIND, a software-hardware framework that optimizes Large Multimodal Models for battery-powered devices by breaking them into modular components and mapping each to optimal accelerators. The system achieves 42.3% energy reduction and enables 20.8 hours of operation running LLaVA-OneVision on a compact device without network connectivity.

AIBullisharXiv – CS AI · Mar 37/105
🧠

HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models

Researchers developed HierarchicalPrune, a compression framework that reduces large-scale text-to-image diffusion models' memory footprint by 77.5-80.4% and latency by 27.9-38.0% while maintaining image quality. The technique enables billion-parameter AI models to run efficiently on resource-constrained devices through hierarchical pruning and knowledge distillation.

AIBullisharXiv – CS AI · Mar 37/104
🧠

ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

Researchers propose ROMA, a new hardware accelerator for running large language models on edge devices using QLoRA. The system uses ROM storage for quantized base models and SRAM for LoRA weights, achieving over 20,000 tokens/s generation speed without external memory.

AIBullisharXiv – CS AI · Mar 37/102
🧠

ButterflyMoE: Sub-Linear Ternary Experts via Structured Butterfly Orbits

ButterflyMoE introduces a breakthrough approach to reduce memory requirements for AI expert models by 150× through geometric parameterization instead of storing independent weight matrices. The method uses shared ternary prototypes with learned rotations to achieve sub-linear memory scaling, enabling deployment of multiple experts on edge devices.

AIBullisharXiv – CS AI · Mar 37/103
🧠

CSRv2: Unlocking Ultra-Sparse Embeddings

CSRv2 introduces a new training approach for ultra-sparse embeddings that reduces inactive neurons from 80% to 20% while delivering 14% accuracy gains. The method achieves 7x speedup over existing approaches and up to 300x improvements in compute and memory efficiency compared to dense embeddings.

AIBullisharXiv – CS AI · Feb 277/106
🧠

Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

Researchers propose 'Intelligence per Watt' (IPW) as a metric to measure AI efficiency, finding that local AI models can handle 71.3% of queries while being 1.4x more energy efficient than cloud alternatives. The study demonstrates that smaller local language models (≤20B parameters) can redistribute computational demand from centralized cloud infrastructure.

AIBullisharXiv – CS AI · Feb 277/108
🧠

UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

Researchers introduce UniQL, a unified framework for quantizing and compressing large language models to run efficiently on mobile devices. The system achieves 4x-5.7x memory reduction and 2.7x-3.4x speed improvements while maintaining accuracy within 5% of original models.

AIBullisharXiv – CS AI · Feb 277/106
🧠

Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

Researchers developed a runtime-reconfigurable bitwise systolic array architecture for multi-precision quantized neural networks on FPGA hardware accelerators. The system achieves 1.3-3.6x speedup on mixed-precision models while supporting higher clock frequencies up to 250MHz, addressing the trade-off between hardware efficiency and inference accuracy.

AIBullisharXiv – CS AI · Feb 277/106
🧠

TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI

Researchers developed TT-SEAL, a selective encryption framework for compressed AI models using Tensor-Train Decomposition that maintains security while encrypting only 4.89-15.92% of parameters. The system achieves the same robustness as full encryption while reducing AES decryption overhead in end-to-end latency from 58% to as low as 2.76%.

AIBullishIEEE Spectrum – AI · Feb 97/105
🧠

New Devices Might Scale the Memory Wall

Researchers at UC San Diego developed a new type of bulk resistive RAM (RRAM) that overcomes traditional limitations by switching entire layers rather than forming filaments. The technology achieved 90% accuracy in AI learning tasks and could enable more efficient edge computing by allowing computation within memory itself.

AIBullishGoogle DeepMind Blog · May 207/105
🧠

Announcing Gemma 3n preview: Powerful, efficient, mobile-first AI

Google announces Gemma 3n preview, a new open-source AI model optimized for mobile devices with multimodal capabilities including audio processing. The model features a unique 2-in-1 architecture designed to enable fast, interactive AI applications directly on devices.

AIBullishHugging Face Blog · Mar 77/108
🧠

LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!

The article provides a guide for running Large Language Models (LLMs) directly on mobile devices using React Native, enabling edge inference capabilities. This development represents a significant step toward decentralized AI processing, reducing reliance on cloud-based services and improving privacy and latency for mobile AI applications.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

Researchers present a systematic analysis of hybrid multi-agent systems combining cloud-based large language models with on-device small language models, revealing that optimal architecture design is highly task-dependent and that increased frontier compute does not guarantee better performance across the power-cost-accuracy Pareto frontier.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

Researchers introduce UI-KOBE, a framework that enhances lightweight mobile GUI agents by combining them with app-specific knowledge graphs to enable more reliable task automation on mobile devices. This approach reduces dependency on large vision-language models, lowering inference costs and improving privacy by enabling on-device deployment without sacrificing performance.

AIBullishDecrypt – AI · 2d ago6/10
🧠

This AI Compressed 'All Human Cooking' Into 2 Megabytes

A London startup successfully compressed 4.1 million recipes across seven languages into a 2-megabyte AI model, demonstrating dramatic efficiency gains in machine learning. This achievement highlights how modern compression techniques and optimized neural architectures enable powerful AI systems to run on minimal computational resources.

This AI Compressed 'All Human Cooking' Into 2 Megabytes
AIBullisharXiv – CS AI · 3d ago6/10
🧠

Delay-Aware Reinforcement Learning for Highway On-Ramp Merging under Stochastic Communication Latency

Researchers introduce DAROM, a reinforcement learning framework designed to handle stochastic communication delays in autonomous vehicle highway merging scenarios. The system uses a delay-aware encoder to maintain decision-making performance despite V2I transmission latencies up to 2.0 seconds, achieving over 99% success rates in high-density traffic conditions.

← PrevPage 2 of 6Next →