#edge-ai News & Analysis

36 articles tagged with #edge-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

36 articles

AIBullisharXiv – CS AI · 3d ago7/10

🧠

CLANE: Continual Learning of Actions on Neuromorphic Hardware from Event Cameras

Researchers have developed CLANE, a neuromorphic hardware system deployed on Intel Loihi 2 that enables continuous learning of human actions from event cameras without forgetting previously learned classes. The system achieves 70.4% accuracy on a 50-class action recognition dataset while consuming 100x less energy and delivering 16x lower latency than conventional GPU-based approaches, advancing on-device AI for AR/VR and robotics applications.

AIBullisharXiv – CS AI · 3d ago7/10

🧠

Advancing Direct Training for Spiking Neural Networks with Circulate-Firing Neurons and Learnable Gradients

Researchers propose a novel direct training algorithm for Spiking Neural Networks that addresses performance gaps with traditional ANNs through circulate-firing neurons, learnable surrogate gradients, and balanced loss functions. The method demonstrates competitive results across datasets and extends effectively to Transformer architectures, potentially advancing energy-efficient neural network applications.

AIBullisharXiv – CS AI · 4d ago7/10

🧠

StreamSplit: Continuous Audio Representation Learning via Uncertainty-Guided Adaptive Splitting

StreamSplit introduces a novel framework enabling continuous contrastive learning on edge devices by dynamically partitioning computation between local and cloud resources. Using reinforcement learning and uncertainty guidance, the system reduces latency by up to 4.7x and bandwidth by 77.1% while maintaining near-server accuracy, making distributed AI inference practical for resource-constrained hardware.

AIBearisharXiv – CS AI · May 127/10

🧠

Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI

A comprehensive empirical study reveals that weight pruning—a technique for compressing large language models for edge devices—paradoxically amplifies bias while preserving performance metrics. The research shows activation-aware pruning methods maintain perplexity but increase stereotype reliance by up to 84%, suggesting current evaluation methods fail to detect fairness degradation in compressed models.

🏢 Perplexity

AIBullisharXiv – CS AI · May 117/10

🧠

XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling

XiYOLO is a new energy-efficient object detection framework that uses neural architecture search and scaling techniques to optimize AI models for edge devices with strict power constraints. The system achieves 20-53% energy reductions compared to YOLOv12 baselines across GPU and NPU deployments while maintaining competitive accuracy metrics.

AIBullisharXiv – CS AI · May 97/10

🧠

When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon

Researchers demonstrate that int4 quantization of KV caches on Apple Silicon's unified memory architecture actually improves performance over fp16, delivering 3-8% faster inference while reducing memory usage by 3x. This inverts the traditional quality-latency tradeoff through a fused Metal kernel combining sign-randomized FFT, per-channel scaling, and int4 packing, with applications from 1B to 1.5B parameter models.

🏢 Hugging Face

AIBullisharXiv – CS AI · May 97/10

🧠

Litespark Inference on Consumer CPUs: Custom SIMD Kernels for Ternary Neural Networks

Litespark-Inference introduces custom SIMD kernels that enable efficient large language model inference on standard consumer CPUs by exploiting ternary neural networks (weights constrained to -1, 0, +1), replacing floating-point multiplication with simple addition and subtraction. The solution achieves dramatic performance improvements—9.2x faster latency and 52x higher throughput on Apple Silicon—making AI workloads accessible to billions of underutilized personal computers.

AIBullisharXiv – CS AI · Apr 157/10

🧠

Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

Researchers introduce Vec-LUT, a novel vector-based lookup table technique that dramatically improves ultra-low-bit LLM inference on edge devices by addressing memory bandwidth underutilization. The method achieves up to 4.2x performance improvements over existing approaches, enabling faster LLM execution on CPUs than specialized NPUs.

AIBullisharXiv – CS AI · Apr 147/10

🧠

Towards Green Wearable Computing: A Physics-Aware Spiking Neural Network for Energy-Efficient IMU-based Human Activity Recognition

Researchers have developed PAS-Net, a physics-aware spiking neural network that dramatically reduces power consumption in wearable IMU-based human activity recognition systems. The architecture achieves state-of-the-art accuracy while cutting energy consumption by up to 98% through sparse integer operations and an early-exit mechanism, establishing a new standard for ultra-low-power edge computing on battery-constrained devices.

AIBearisharXiv – CS AI · Mar 277/10

🧠

Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Researchers discovered significant privacy vulnerabilities in local Vision-Language Models that use Dynamic High-Resolution preprocessing. The dual-layer attack framework can exploit execution-time variations and cache patterns to infer sensitive information about processed images, even when models run locally for privacy.

AIBullisharXiv – CS AI · Mar 177/10

🧠

SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI

SPARQ introduces a unified framework combining spiking neural networks, quantization-aware training, and reinforcement learning-guided early exits for energy-efficient edge AI. The system achieves up to 5.15% higher accuracy than conventional quantized SNNs while reducing system energy consumption by over 330 times and cutting synaptic operations by over 90%.

AINeutralarXiv – CS AI · Mar 167/10

🧠

Embedded Quantum Machine Learning in Embedded Systems: Feasibility, Hybrid Architectures, and Quantum Co-Processors

Research paper explores embedded quantum machine learning (EQML) feasibility for edge devices like IoT nodes and drones by 2026. The study identifies hybrid workflows and embedded quantum co-processors as the most viable implementation pathways, while highlighting major barriers including latency, data encoding overhead, and energy constraints.

AIBullisharXiv – CS AI · Mar 47/102

🧠

Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States

Researchers developed a new channel-adaptive AI algorithm that maximizes inference throughput in 6G edge computing networks by dynamically adjusting computational complexity based on channel conditions. The system uses integrated communication and computation (IC²) to optimize both feature compression and model complexity for mobile edge inference.

AINeutralarXiv – CS AI · Mar 47/102

🧠

Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI Architectures

Research identifies a critical bottleneck in Vision-Language-Action (VLA) models for edge AI, where up to 75% of latency comes from memory-bound action generation phases. The study analyzes performance on Nvidia edge hardware and projects requirements for scaling to 100B parameter models in robotics applications.

AIBullisharXiv – CS AI · Feb 277/108

🧠

RAGdb: A Zero-Dependency, Embeddable Architecture for Multimodal Retrieval-Augmented Generation on the Edge

Researchers introduce RAGdb, a revolutionary architecture that consolidates Retrieval-Augmented Generation into a single SQLite container, eliminating the need for cloud infrastructure and GPUs. The system achieves 100% entity retrieval accuracy while reducing disk footprint by 99.5% compared to traditional Docker-based RAG stacks, enabling truly portable AI applications for edge computing and privacy-sensitive environments.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

Energy-Aware NECO for Single-Pass Pixel-wise Out-of-Distribution Detection in Semantic Segmentation

Researchers propose Energy-Aware NECO, a single-pass machine learning method for detecting out-of-distribution data in semantic segmentation tasks. The hybrid approach combines geometric and energy-based scoring to achieve 85.39% detection accuracy while maintaining computational efficiency for edge deployment on mobile robots.

AINeutralarXiv – CS AI · 3d ago5/10

🧠

Quantum Machine Learning-based 6G edge Network: Enabling Adaptive Communication and Model Aggregation

Researchers propose a quantum machine learning framework for 6G vehicle-to-everything (V2X) communication that combines quantum neural networks, federated learning, and semantic communication to improve efficiency and robustness in autonomous transportation systems. The framework addresses limitations of classical ML in handling high-dimensional data, heterogeneous networks, and dynamic channel conditions.

AIBullisharXiv – CS AI · 3d ago6/10

🧠

STARS: Spike Tail-Aware Relational Synthesis for ANN-to-SNN Data-Free Knowledge Distillation

Researchers introduce STARS, a data-free knowledge distillation method that improves the transfer of learning from artificial neural networks (ANNs) to spiking neural networks (SNNs) without access to original training data. The technique combines batch normalization matching with relational consistency and threshold-aware regularization, achieving significant accuracy improvements across standard benchmarks.

AIBearisharXiv – CS AI · 3d ago6/10

🧠

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

Researchers audit NVIDIA's GB10 edge AI hardware shipping in 2026 and find it lacks critical energy monitoring capabilities at the CPU level, preventing process-level energy attribution essential for optimizing agentic AI workloads. While MediaTek firmware contains undocumented energy telemetry, NVIDIA has stated no plans to expose this data, forcing developers to rely on external DC metering as a workaround.

🏢 Nvidia

AIBullisharXiv – CS AI · 3d ago6/10

🧠

ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference

ASTRA is a new framework that enables efficient multi-device Transformer inference by combining sequence parallelism with mixed-precision attention, allowing non-local token embeddings to be transmitted as compressed codes while maintaining full precision for local attention. The system achieves significant speedups (up to 2.64x) over single-device inference while operating at extremely low bandwidth requirements (as low as 10 Mbps), making it practical for bandwidth-constrained environments.

🧠 Llama

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Edge AI Deployment Beyond Models: A BSP-Aware Systems Framework for Industrial Embedded Platforms

This academic paper presents a systematic framework for deploying AI models on industrial embedded systems, arguing that successful Edge AI requires treating deployment as a holistic systems problem rather than a late-stage packaging task. The five-layer framework addresses hardware, BSP/OS adaptation, runtime acceleration, application inference, and operations/validation, with implications for reproducibility and field reliability in long-lifecycle industrial products.

🏢 Nvidia

AIBullisharXiv – CS AI · May 126/10

🧠

Agent-X: Full Pipeline Acceleration of On-device AI Agents

Researchers introduce Agent-X, a software framework that accelerates LLM-based agents running on edge devices by optimizing both prefill and decode stages through prompt rewriting and LLM-free speculative decoding. The framework achieves 1.61x end-to-end speedup with no accuracy loss, addressing a critical performance bottleneck in on-device AI deployments.

AINeutralarXiv – CS AI · May 126/10

🧠

Agentic Performance at the Edge: Insights from Benchmarking

Researchers benchmark agentic AI performance on edge devices constrained to 8 billion parameters or smaller, finding that model quality loss isn't simply proportional to parameter reduction. The study reveals that optimal edge-agent deployment requires joint optimization of model selection and tool workflows, with distinct failure patterns across model families guiding practical deployment strategies.

AINeutralarXiv – CS AI · May 126/10

🧠

Optimized Culprit Identification Using Mobilenet and Attention Mechanisms

Researchers propose an optimized deep learning model combining MobileNet with attention mechanisms for automated facial identification in surveillance systems, achieving 97.8% accuracy while maintaining computational efficiency for real-time deployment.

AIBullisharXiv – CS AI · May 126/10

🧠

TinySSL: Distilled Self-Supervised Pretraining for Sub-Megabyte MCU Models

Researchers introduce CA-DSSL, a new self-supervised learning technique that enables efficient AI model training on microcontrollers with under 500K parameters. The method surpasses existing approaches by 18 percentage points on standard benchmarks while requiring significantly fewer parameters, achieving 94% of supervised learning performance with models deployable in just 378 KB of memory.

Page 1 of 2Next →