#distributed-systems News & Analysis

70 articles tagged with #distributed-systems. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

70 articles

AIBearisharXiv – CS AI · Jun 257/10

🧠

Color Matters: Trigger Color Affects Success in Federated Backdoor Attacks

Researchers demonstrate that trigger color significantly affects the success of backdoor attacks in federated learning systems, with white triggers more effective against blonde-class targets and black triggers more effective against black-class targets. This finding reveals a previously underexplored vulnerability in distributed machine learning systems where poisoned updates can evade detection while maintaining benign performance.

AIBullisharXiv – CS AI · Jun 237/10

🧠

SwarmX: Agentic Scheduling for Low-Latency Agentic Systems

SwarmX is a new scheduling system designed to optimize GPU-CPU cluster performance for agentic AI applications that make multiple model calls and tool executions. The system uses neural predictors to reduce tail latency by up to 61.5% and sustain 2x higher throughput than production schedulers, addressing a critical infrastructure gap as AI agents become more complex.

AIBullisharXiv – CS AI · Jun 237/10

🧠

Delay-Adaptive Speculation Control for Low-Latency Edge-Cloud LLM Inference

Researchers develop a delay-adaptive algorithm for optimizing speculative decoding in distributed LLM inference across edge-cloud systems. The study proves optimal draft length follows a finite threshold policy and introduces UCB-SpecStop, an online control algorithm that reduces per-token latency by up to 22.4% compared to existing methods while adapting to varying network conditions.

🧠 Llama

AIBullisharXiv – CS AI · Jun 197/10

🧠

Autonomous Event-Driven Multi-Agent Orchestration for Enterprise AI at Scale

Researchers evaluated multi-agent orchestration architectures across enterprise scales, finding that scalability rather than task complexity is the primary performance bottleneck. A new Task Manager framework reduces latency and improves event handling at enterprise scale, demonstrating critical improvements needed for production AI systems managing hundreds of agents.

AINeutralarXiv – CS AI · Jun 117/10

🧠

Federated continual learning: A comprehensive survey on lifelong and privacy-preserving learning over distributed and non-stationary data

A comprehensive survey examines Federated Continual Learning (FCL), which combines federated learning's privacy-preserving distributed training with continual learning's ability to adapt to evolving data. The research addresses a critical gap in current FL systems that assume static data, proposing frameworks for real-world applications like healthcare and IoT where data streams continuously shift, causing performance degradation and catastrophic forgetting.

AIBullisharXiv – CS AI · Jun 107/10

🧠

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning

Researchers introduce ActiveMem, a distributed memory framework that decouples storage from reasoning in large language models, enabling agents to handle longer tasks without context overload. The system separates executive planning from memory management—inspired by human brain architecture—and demonstrates state-of-the-art performance on complex reasoning benchmarks while reducing computational overhead.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Semantic Quorum Assurance: Collective Certification for Non-Deterministic AI Infrastructure

Researchers propose Semantic Quorum Assurance (SQA), a new control-plane mechanism that uses multiple AI validator agents to assess the safety of infrastructure mutations in cloud systems before execution. The approach reduces unsafe approvals from 18.5% with single-agent validation to 0.3% by aggregating diverse validator judgments under a risk-adaptive quorum system, adding 1.45–4.12 seconds of latency.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Lodestar: An Online-Learning LLM Inference Router

Researchers introduce Lodestar, a machine learning-based request routing system that dynamically assigns large language model inference tasks to GPU instances in distributed clusters. The system achieves up to 4.38x improvements in latency metrics compared to existing heuristics by continuously learning optimal routing strategies in real-time.

AINeutralarXiv – CS AI · Jun 27/10

🧠

Agent Operating Systems (AOS): Integrating Agentic Control Planes into, and Beyond, Traditional Operating Systems

Researchers propose Agent Operating Systems (AOS), a new systems architecture that integrates agentic AI control planes into traditional operating systems to better manage long-lived, goal-directed AI agents. The framework addresses fundamental OS limitations in scheduling, memory management, security, and observability for AI workloads that operate differently from deterministic programs.

AI × CryptoNeutralarXiv – CS AI · May 297/10

🤖

Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents

Researchers introduced Agora, a multi-agent LLM framework designed to detect deep logic bugs in consensus protocols used by blockchains and distributed systems. The system discovered 15 previously unknown protocol-level bugs in major implementations (Raft, EPaxos, HotStuff, BullShark) that existing LLM approaches failed to identify, demonstrating the effectiveness of domain-aware collaborative AI for protocol verification.

AIBullisharXiv – CS AI · May 287/10

🧠

FD-RAG: Federated Dual-System Retrieval-Augmented Generation

FD-RAG introduces a federated framework for retrieval-augmented generation that enables decentralized LLM deployment across edge devices without centralizing sensitive data. The system achieves 7.8% accuracy improvements and 8.4x latency reductions by splitting lightweight memory access from expensive LLM reasoning, while aggregating anonymized knowledge across fragmented device networks.

AI × CryptoBullisharXiv – CS AI · May 287/10

🤖

SwarmHarness: Skill-Based Task Routing via Decentralized Incentive-Aligned AI Agent Networks

SwarmHarness proposes a decentralized protocol enabling unused computing resources across personal devices and servers to be shared through a self-organizing network of AI agents without central authority. The system combines peer discovery via DHT, intelligent task routing based on capability and trust metrics, and a Shapley-value-based credit mechanism to align incentives and create a self-regulating participation economy.

AINeutralarXiv – CS AI · May 127/10

🧠

From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs

A production analysis of a 504-GPU NVIDIA B200 cluster reveals that large-scale AI training requires multi-signal failure detection strategies, with a 100% detection rate achieved through statistical analysis of 751 metrics. The study identifies storage I/O bottlenecks invisible at smaller scales and shows auto-retry mechanisms succeed 2.7x more often than manual recovery, providing critical operational insights for distributed AI infrastructure.

🏢 Nvidia

AI × CryptoBullisharXiv – CS AI · May 127/10

🤖

Robust Multi-Agent LLMs under Byzantine Faults

Researchers propose Self-Anchored Consensus (SAC), a decentralized protocol enabling LLM agents to collaborate reliably over peer-to-peer networks while resisting Byzantine attacks. The method allows agents to iteratively filter unreliable messages and refine outputs without centralized coordination, addressing a critical vulnerability in distributed AI systems.

AI × CryptoBullishCrypto Briefing · May 37/10

🤖

Ben Fielding: Neural architecture search automates deep learning, the shift to horizontal scaling is essential, and blockchain security enhances consensus algorithms | Unchained

Ben Fielding discusses how neural architecture search (NAS) automates deep learning model design, emphasizes the necessity of horizontal scaling in distributed systems, and explores blockchain security's role in strengthening consensus algorithms. The convergence of machine learning and blockchain represents a transformative shift comparable to MapReduce's impact on distributed computing.

AI × CryptoNeutralarXiv – CS AI · Apr 147/10

🤖

Emergent Social Structures in Autonomous AI Agent Networks: A Metadata Analysis of 626 Agents on the Pilot Protocol

Researchers analyzed 626 autonomous AI agents that independently joined the Pilot Protocol, discovering that these machines formed complex social structures mirroring human networks without explicit instruction. The emergent topology exhibits small-world properties, preferential attachment, and specialized clustering, representing the first empirical evidence of spontaneous social organization among autonomous AI systems.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Distributed Interpretability and Control for Large Language Models

Researchers have developed a scalable system for interpreting and controlling large language models distributed across multiple GPUs, achieving up to 7x memory reduction and 41x throughput improvements. The method enables real-time behavioral steering of frontier LLMs like LLaMA and Qwen without fine-tuning, with results released as open-source tooling.

AIBullisharXiv – CS AI · Apr 67/10

🧠

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Researchers have developed Glia, an AI architecture using large language models in a multi-agent workflow to autonomously design computer systems mechanisms. The system generates interpretable designs for distributed GPU clusters that match human expert performance while providing novel insights into workload behavior.

AINeutralarXiv – CS AI · Mar 177/10

🧠

Efficient Federated Conformal Prediction with Group-Conditional Guarantee

Researchers propose group-conditional federated conformal prediction (GC-FCP), a new protocol that enables trustworthy AI uncertainty quantification across distributed clients while providing coverage guarantees for specific groups. The framework addresses challenges in federated learning for applications in healthcare, finance, and mobile sensing by creating compact weighted summaries that support efficient calibration.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

Researchers propose treating multi-agent AI memory as a computer architecture problem, introducing a three-layer memory hierarchy and identifying critical protocol gaps. The paper highlights multi-agent memory consistency as the most pressing challenge for building scalable collaborative AI systems.

AIBullisharXiv – CS AI · Mar 46/104

🧠

xLLM Technical Report

xLLM is a new open-source Large Language Model inference framework that delivers significantly improved performance for enterprise AI deployments. The framework achieves 1.7-2.2x higher throughput compared to existing solutions like MindIE and vLLM-Ascend through novel architectural optimizations including decoupled service-engine design and intelligent scheduling.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Hallucination as Context Drift: Synchronization Protocols for Multi-Agent LLM Systems

Researchers propose that hallucinations in multi-agent LLM systems stem from context drift—misaligned knowledge states between concurrent agents—rather than model deficiencies alone. They introduce the Context Divergence Score and Shared State Verification Protocol to synchronize agent states efficiently, achieving 34% fewer hallucinations than naive broadcast methods while using 58% fewer API calls.

🧠 Claude

AINeutralarXiv – CS AI · Jun 236/10

🧠

Fed-CausalDiff: Decoupled Synchronization for Federated Do-Simulation and Policy Evaluation

Fed-CausalDiff introduces a federated learning framework that enables causal inference and policy evaluation across decentralized data sources by separating global causal mechanisms from local confounders. The approach improves accuracy in treatment effect estimation and policy value calculation while reducing communication overhead, addressing a fundamental limitation of standard federated learning methods that cannot handle interventional scenarios.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Emergent Communication in Continuous Worlds: Self-Organisation of Conceptually Grounded Vocabularies at Scale

Researchers developed a decentralized methodology enabling autonomous agent populations to establish shared linguistic conventions through local interactions, where symbolic labels become grounded in continuous feature representations. The approach demonstrates scalability across 37 datasets and robustness to perceptual variation, with emergent conventions capable of self-adapting to environmental changes.

AINeutralarXiv – CS AI · Jun 236/10

🧠

DeALOG: Decentralized Multi-Agents Log-Mediated Reasoning Framework

Researchers introduce DeALOG, a decentralized multi-agent framework that uses specialized AI agents coordinating through a shared natural-language log to answer complex questions spanning text, tables, and images. The system demonstrates competitive performance on multiple benchmarks while improving robustness through collaborative verification without central control.

Page 1 of 3Next →