Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture
Researchers propose the Intelligent Computing Architecture Model (ICAM), a six-layer framework that applies classical computer architecture principles to large language models and agentic AI systems. The paper maps recurring engineering challenges—cache reuse, context management, agent scheduling, and permission control—to traditional systems problems, introducing three design laws to optimize model-native computing efficiency and coordination.
This arXiv paper represents a significant conceptual contribution to understanding how AI systems should be engineered as they transition from isolated models to integrated computing platforms. The authors observe that as LLMs power autonomous agents and code generation tools, they encounter the same architectural problems that computer scientists solved decades ago: resource allocation, scheduling, memory management, and access control. Rather than reinventing solutions, ICAM proposes borrowing proven patterns from CPU and OS design.
The framework's dual-plane architecture—separating probabilistic execution (what can be computed) from deterministic control (what should be computed)—offers a useful mental model for developers building agent systems. The three proposed laws (Semantic Locality, Context Budget, and Agent Speedup) provide testable hypotheses about scaling bottlenecks. By framing LLM systems through a computer architecture lens, the paper creates common vocabulary across disparate areas: LLM-as-OS research, memory optimization, multi-agent frameworks, and safety governance.
For the AI development community, this work matters because fragmented approaches to these problems waste engineering effort. Developers building multi-agent systems or scaling inference face identical challenges but solve them independently. ICAM's unified framework could accelerate practical improvements in throughput, latency, and resource efficiency. The paper also identifies where the analogy breaks down—probabilistic computing differs fundamentally from deterministic CPUs—which grounds the contribution in realistic constraints.
The research roadmap suggests this is opening a field rather than closing one. Future work on model-native system design, standardized agent protocols, and formal verification of safety properties will likely draw heavily from this framework, making it potentially foundational for next-generation AI infrastructure.
- →ICAM provides a six-layer architectural framework mapping computer science principles to LLM-based systems, unifying disparate engineering practices.
- →The dual-plane model separates what LLMs can compute probabilistically from what they should compute deterministically, resolving ambiguity about their role.
- →Three design laws—Semantic Locality, Context Budget, and Agent Speedup—offer testable principles for optimizing inference, memory, and multi-agent coordination.
- →This conceptual contribution lacks experimental validation but proposes a research roadmap for model-native computing infrastructure.
- →Understanding LLM systems through classical architecture lenses could accelerate practical improvements in efficiency, safety governance, and developer productivity.