#edge-computing News & Analysis

77 articles tagged with #edge-computing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

77 articles

AIBullisharXiv – CS AI · 2d ago7/10

🧠

SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models

SVD-Prune introduces a training-free token pruning method for Vision-Language Models using Singular Value Decomposition to reduce computational overhead. The approach maintains model performance while drastically reducing vision tokens to 16-32, addressing efficiency challenges in multimodal AI systems without requiring retraining.

AIBullisharXiv – CS AI · 3d ago7/10

🧠

Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer

Researchers introduce Ge²mS-T, a novel Spiking Vision Transformer architecture that optimizes energy efficiency while maintaining training and inference performance through multi-dimensional grouped computation. The approach addresses fundamental limitations in existing SNN paradigms by balancing memory overhead, learning capability, and energy consumption simultaneously.

AIBullisharXiv – CS AI · Mar 277/10

🧠

LLM4AD: Large Language Models for Autonomous Driving -- Concept, Review, Benchmark, Experiments, and Future Trends

Researchers have published a comprehensive review of Large Language Models for Autonomous Driving (LLM4AD), introducing new benchmarks and conducting real-world experiments on autonomous vehicle platforms. The paper explores how LLMs can enhance perception, decision-making, and motion control in self-driving cars, while identifying key challenges including latency, security, and safety concerns.

AIBullishTechCrunch – AI · Mar 267/10

🧠

Mistral releases a new open-source model for speech generation

Mistral has released a new open-source speech generation model that is lightweight enough to run on mobile devices including smartwatches and smartphones. This represents a significant advancement in making AI speech capabilities more accessible and portable for edge computing applications.

AIBullisharXiv – CS AI · Mar 267/10

🧠

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

Researchers developed the Cognitive Firewall, a hybrid edge-cloud defense system that protects browser-based AI agents from indirect prompt injection attacks. The three-stage architecture reduces attack success rates to below 1% while maintaining 17,000x faster response times compared to cloud-only solutions by processing simple attacks locally and complex threats in the cloud.

AIBullisharXiv – CS AI · Mar 177/10

🧠

PrototypeNAS: Rapid Design of Deep Neural Networks for Microcontroller Units

PrototypeNAS is a new zero-shot neural architecture search method that rapidly designs and optimizes deep neural networks for microcontroller units without requiring extensive training. The system uses a three-step approach combining structural optimization, ensemble zero-shot proxies, and Hypervolume subset selection to identify efficient models within minutes that can run on resource-constrained edge devices.

AIBullisharXiv – CS AI · Mar 177/10

🧠

HO-SFL: Hybrid-Order Split Federated Learning with Backprop-Free Clients and Dimension-Free Aggregation

Researchers propose HO-SFL (Hybrid-Order Split Federated Learning), a new framework that enables memory-efficient fine-tuning of large AI models on edge devices by eliminating backpropagation on client devices while maintaining convergence speed comparable to traditional methods. The approach significantly reduces communication costs and memory requirements for distributed AI training.

AIBullisharXiv – CS AI · Mar 177/10

🧠

SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI

SPARQ introduces a unified framework combining spiking neural networks, quantization-aware training, and reinforcement learning-guided early exits for energy-efficient edge AI. The system achieves up to 5.15% higher accuracy than conventional quantized SNNs while reducing system energy consumption by over 330 times and cutting synaptic operations by over 90%.

AIBullisharXiv – CS AI · Mar 127/10

🧠

MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios

Researchers introduce MoE-SpAc, a new framework for efficient Mixture-of-Experts model inference on edge devices that achieves 42% improvement over existing baselines. The system uses speculative decoding as a memory management tool and demonstrates 4.04x average speedup across benchmarks.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Meissa: Multi-modal Medical Agentic Intelligence

Researchers have developed Meissa, a lightweight 4B-parameter medical AI model that brings advanced agentic capabilities offline for healthcare applications. The system matches frontier models like GPT in medical benchmarks while operating with 25x fewer parameters and 22x lower latency, addressing privacy and cost concerns in clinical settings.

🧠 Gemini

AINeutralarXiv – CS AI · Mar 97/10

🧠

Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum

Researchers propose a framework for decentralized resource allocation in real-time AI services across device-edge-cloud infrastructure. The study shows that dependency graph topology determines whether price-based allocation can work at scale, with hierarchical structures enabling stable pricing while complex dependencies cause instability.

AIBullisharXiv – CS AI · Mar 67/10

🧠

AI+HW 2035: Shaping the Next Decade

A research paper presents a 10-year roadmap for coordinated AI and hardware co-development, targeting 1000x efficiency improvements in AI training and inference by 2035. The vision emphasizes energy efficiency over raw compute scaling, proposing integrated solutions across algorithms, architectures, and systems to enable sustainable AI deployment from cloud to edge environments.

AIBullisharXiv – CS AI · Mar 67/10

🧠

Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices

Researchers developed a memory management system for multi-agent AI systems on edge devices that reduces memory requirements by 4x through 4-bit quantization and eliminates redundant computation by persisting KV caches to disk. The solution reduces time-to-first-token by up to 136x while maintaining minimal impact on model quality across three major language model architectures.

🏢 Perplexity🧠 Llama

AIBullisharXiv – CS AI · Mar 56/10

🧠

LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics

Researchers developed LiteVLA-Edge, a deployment-oriented Vision-Language-Action model pipeline that enables fully on-device inference on embedded robotics hardware like Jetson Orin. The system achieves 150.5ms latency (6.6Hz) through FP32 fine-tuning combined with 4-bit quantization and GPU-accelerated inference, operating entirely offline within a ROS 2 framework.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Adaptive Quantized Planetary Crater Detection System for Autonomous Space Exploration

Researchers propose an Adaptive Quantized Planetary Crater Detection System (AQ-PCDSys) that uses quantized neural networks and multi-sensor fusion to enable real-time AI-powered crater detection on resource-constrained space exploration hardware. The system addresses the critical bottleneck of deploying sophisticated deep learning models on power-limited, radiation-hardened space computers.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing

Researchers propose a new framework for Agentic Peer-to-Peer Networks where AI agents on edge devices can collaborate by sharing capabilities and actions rather than static files. The system introduces tiered verification methods to ensure security and reliability when AI agents delegate tasks to untrusted peers in decentralized networks.

AIBullisharXiv – CS AI · Mar 47/102

🧠

NeuroSkill(tm): Proactive Real-Time Agentic System Capable of Modeling Human State of Mind

NeuroSkill is a new open-source AI system that models human mental states in real-time using brain-computer interfaces and biophysical signals. The system runs offline on edge devices and can engage with humans on cognitive and emotional levels through its NeuroLoop harness technology.

AIBullisharXiv – CS AI · Mar 46/102

🧠

TinyIceNet: Low-Power SAR Sea Ice Segmentation for On-Board FPGA Inference

Researchers developed TinyIceNet, a compact AI model for real-time sea ice mapping using satellite SAR imagery, designed specifically for on-board FPGA processing in space. The system achieves 75.216% F1 score while consuming 50% less energy than GPU baselines, demonstrating practical AI deployment for maritime navigation in polar regions.

$NEAR

AIBullisharXiv – CS AI · Mar 37/104

🧠

ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

Researchers propose ROMA, a new hardware accelerator for running large language models on edge devices using QLoRA. The system uses ROM storage for quantized base models and SRAM for LoRA weights, achieving over 20,000 tokens/s generation speed without external memory.

AIBullisharXiv – CS AI · Mar 37/105

🧠

HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models

Researchers developed HierarchicalPrune, a compression framework that reduces large-scale text-to-image diffusion models' memory footprint by 77.5-80.4% and latency by 27.9-38.0% while maintaining image quality. The technique enables billion-parameter AI models to run efficiently on resource-constrained devices through hierarchical pruning and knowledge distillation.

AIBullisharXiv – CS AI · Mar 37/103

🧠

CSRv2: Unlocking Ultra-Sparse Embeddings

CSRv2 introduces a new training approach for ultra-sparse embeddings that reduces inactive neurons from 80% to 20% while delivering 14% accuracy gains. The method achieves 7x speedup over existing approaches and up to 300x improvements in compute and memory efficiency compared to dense embeddings.

AIBullisharXiv – CS AI · Mar 37/102

🧠

ButterflyMoE: Sub-Linear Ternary Experts via Structured Butterfly Orbits

ButterflyMoE introduces a breakthrough approach to reduce memory requirements for AI expert models by 150× through geometric parameterization instead of storing independent weight matrices. The method uses shared ternary prototypes with learned rotations to achieve sub-linear memory scaling, enabling deployment of multiple experts on edge devices.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices

Researchers developed NANOMIND, a software-hardware framework that optimizes Large Multimodal Models for battery-powered devices by breaking them into modular components and mapping each to optimal accelerators. The system achieves 42.3% energy reduction and enables 20.8 hours of operation running LLaVA-OneVision on a compact device without network connectivity.

AIBullisharXiv – CS AI · Feb 277/106

🧠

TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI

Researchers developed TT-SEAL, a selective encryption framework for compressed AI models using Tensor-Train Decomposition that maintains security while encrypting only 4.89-15.92% of parameters. The system achieves the same robustness as full encryption while reducing AES decryption overhead in end-to-end latency from 58% to as low as 2.76%.

AIBullisharXiv – CS AI · Feb 277/108

🧠

UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

Researchers introduce UniQL, a unified framework for quantizing and compressing large language models to run efficiently on mobile devices. The system achieves 4x-5.7x memory reduction and 2.7x-3.4x speed improvements while maintaining accuracy within 5% of original models.

Page 1 of 4Next →