#llm-architecture News & Analysis

41 articles tagged with #llm-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

41 articles

AIBullisharXiv – CS AI · 1d ago7/10

🧠

VISTA Architect: A graph database-oriented health AI system demonstrated in multidisciplinary tumor boards

Stanford Medicine researchers unveiled VISTA Architect, a graph database-powered AI system that integrates large language models with electronic health records to achieve 96.4% accuracy in clinical data extraction for tumor board preparation. The architecture precomputes patient histories into organized knowledge graphs, reducing processing time and latency compared to traditional RAG approaches while maintaining full data provenance.

AIBullisharXiv – CS AI · 1d ago7/10

🧠

StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management

StackPlanner introduces a hierarchical multi-agent system that improves coordination among large language model-based agents through explicit memory management and reusable experience learning. The framework addresses critical limitations in centralized multi-agent architectures by decoupling high-level coordination from task execution and enabling agents to retain and leverage past coordination strategies, demonstrating improved performance on complex benchmarks.

AIBullisharXiv – CS AI · Jun 97/10

🧠

MatMind: A Structure-Activity Knowledge-Driven Generative Foundation Model for Materials Science

MatMind is a generative foundation model designed for crystal materials science that unifies structure prediction, property forecasting, and material design within a single LLM-based framework. The model surpasses specialized graph neural networks on benchmark tasks while achieving 65.3% success on crystal generation, demonstrating that unified AI architectures can compete with purpose-built narrow specialists.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Microskill Architecture: A Modular Skill-Driven Framework for AI-Native Code Generation

Researchers introduce MicroSkill Architecture, a modular framework that organizes AI coding knowledge into atomic skill capsules rather than feeding entire codebases to language models. The approach reduces token consumption by 90%, doubles compilation success rates, and eliminates architectural violations in enterprise systems.

AINeutralarXiv – CS AI · Jun 27/10

🧠

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Researchers establish fundamental information-theoretic limits on decoder-only transformer attention for state-tracking tasks, proving extended reasoning degrades performance beyond a 'Deterministic Horizon' of 19-31 steps. Tool delegation consistently outperforms neural chain-of-thought across 12 models (86-94% vs 24-42% accuracy), suggesting hybrid agentic systems require external tools rather than pure neural reasoning for complex deterministic tasks.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Researchers propose the Intelligent Computing Architecture Model (ICAM), a six-layer framework that applies classical computer architecture principles to large language models and agentic AI systems. The paper maps recurring engineering challenges—cache reuse, context management, agent scheduling, and permission control—to traditional systems problems, introducing three design laws to optimize model-native computing efficiency and coordination.

🧠 Claude

AIBullisharXiv – CS AI · Jun 27/10

🧠

MemPro: Agentic Memory Systems as Evolvable Programs

Researchers introduce MemPro, an evolution framework that treats autonomous agent memory systems as adaptable programs rather than static pipelines. By iteratively diagnosing failures and refining the entire memory-construction-retrieval pipeline, MemPro outperforms fixed baselines on multiple benchmarks while maintaining computational efficiency.

AIBullisharXiv – CS AI · May 287/10

🧠

Periodic RoPE for Infinite Context LLMs

Researchers propose Periodic RoPE (P-RoPE), a novel positional encoding mechanism that combines sliding window attention for local dependencies with global attention layers lacking positional constraints, enabling language models to theoretically support infinite context windows without performance degradation. The approach addresses a fundamental limitation in current LLMs where model performance degrades when sequence length exceeds the pre-trained range of positional encodings like RoPE.

AIBearisharXiv – CS AI · May 127/10

🧠

Position: Avoid Overstretching LLMs for every Enterprise Task

A new research position argues that enterprises should stop treating large language models as monolithic solutions for all tasks and instead use them primarily for structured data extraction within modular architectures. The paper contends that LLMs have inherent capacity limits for enterprise knowledge needs and proposes delegating computation and storage to specialized components like knowledge bases and symbolic systems for better reliability and cost efficiency.

AIBullisharXiv – CS AI · May 127/10

🧠

Continuous Latent Contexts Enable Efficient Online Learning in Transformers

Researchers demonstrate that transformer models equipped with continuous latent context tokens can efficiently implement online learning algorithms without parameter updates. A small GPT-2-style model trained with this approach outperforms much larger language models on synthetic online prediction tasks, suggesting a promising architectural direction for adaptive AI systems.

AIBullisharXiv – CS AI · May 127/10

🧠

The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents

Researchers propose Agent Cybernetics, a theoretical framework applying mid-20th century control systems theory to modern LLM-based AI agents. The framework addresses critical gaps in how foundation agents are designed, offering scientific principles for reliability, continuous operation, and safe self-improvement across long-horizon tasks.

AIBullisharXiv – CS AI · May 77/10

🧠

LCM: Lossless Context Management

Researchers introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks up to 1M tokens. LCM combines recursive context compression with engine-managed task partitioning, representing an evolution of recursive language models that prioritizes reliability and state retrievability over flexibility.

🧠 Claude🧠 Opus

AIBearisharXiv – CS AI · May 47/10

🧠

Language Models Struggle to Use Representations Learned In-Context

A new research study reveals that large language models struggle to effectively use representations they learn from in-context information, even though they can encode this information internally. The findings suggest current LLMs have fundamental limitations in adapting to novel contexts, affecting their ability to generalize learned patterns to downstream tasks.

AIBullisharXiv – CS AI · Apr 207/10

🧠

CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling

Researchers introduce CoMeT (Collaborative Memory Transformer), a novel architecture that enables large language models to process arbitrarily long sequences with constant memory usage and linear time complexity. The system uses a dual-memory approach with FIFO queues and gated updates, demonstrating remarkable performance on long-context tasks including 1M token sequences and real-world applications.

AINeutralarXiv – CS AI · 1d ago6/10

🧠

United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory

Researchers introduce CoThinker, a multi-agent LLM framework inspired by Cognitive Load Theory, which distributes computational tasks across specialized agents to overcome context limitations. The system shows performance gains on reasoning-heavy tasks but reveals coordination overhead on simpler tasks, offering principled design insights for multi-agent AI systems.

AINeutralarXiv – CS AI · 1d ago6/10

🧠

The Token Tax of Epistemic Accuracy: Comparing RAG and Long-Context Architectures for Document-Grounded Generative AI Applications

Researchers compare retrieval-augmented generation (RAG) versus long-context prompting for document-grounded AI applications, finding that while long-context achieves higher accuracy (73.1% vs 65.4%), it incurs a 26x higher token cost. The study frames this trade-off as an 'epistemic accuracy' versus computational expense frontier, with significant implications for resource-constrained organizations.

AIBullisharXiv – CS AI · Jun 116/10

🧠

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

Researchers present SWARR, a two-stage method combining supervised fine-tuning and reinforcement learning to make sliding-window attention (SWA) competitive with standard self-attention for mathematical reasoning tasks. By using RL to adapt model trajectories to SWA's architectural constraints, the approach recovers much of the accuracy lost during conversion while maintaining linear-complexity efficiency benefits.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Hey Chat, Can You Teach Me? Structuring Socratic Dialogue for Human Learning in the Wild

Researchers demonstrate that scaling large language models alone is insufficient for effective tutoring. By combining knowledge graphs with reinforcement learning to structure Socratic dialogue, their system outperforms frontier LLMs and specialized education models in teaching STEM and non-STEM subjects over extended sessions.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Whisper-GPT -- Continuous Discrete Hybrid Representation Language Models For Speech And Music

Researchers introduce Whisper-GPT, a hybrid language model that combines continuous audio representations (spectrograms) with discrete acoustic tokens to improve speech and music generation. This approach addresses context length limitations in traditional token-based models while maintaining high-fidelity audio synthesis capabilities.

🏢 Perplexity

AINeutralarXiv – CS AI · Jun 106/10

🧠

Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models

Researchers introduce UniTok, a universal tokenizer that converts continuous time series data into discrete tokens, enabling UniTok-FM—a foundation model pretrained via next-token prediction. This unified approach supports forecasting, generation, and classification tasks without task-specific modifications, achieving competitive performance with specialized models while enabling zero-shot and few-shot inference capabilities.

AIBullisharXiv – CS AI · Jun 96/10

🧠

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Researchers present SearchSwarm, a framework that trains large language models to intelligently delegate complex tasks to subagents while managing finite context windows. The resulting 30B-parameter model achieves state-of-the-art performance on research benchmarks by learning when and what to delegate, addressing a critical bottleneck in agentic AI systems.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Reachability and asymptotics of Gaussian Transformer dynamics

Researchers have formulated Transformer data propagation as a nonlinear control system and proven that Gaussian distributions remain Gaussian through the network's layers. This reduces infinite-dimensional dynamics to finite-dimensional equations governing mean and covariance evolution, connecting Transformer expressiveness to classical control theory and revealing conditions for stability or divergence.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Declarative Skills for AI Agents in Knowledge-Grounded Tool-Use Workflows

Researchers compare three orchestration approaches for AI agents handling customer-service workflows: declarative agents using natural-language skill files, imperative agents with programmatic state machines, and unscaffolded baseline agents. The study finds that retrieval quality is the dominant bottleneck, and declarative skills improve performance on procedural tasks only when evidence quality is high.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Inverse Depth Scaling From Most Layers Being Similar

Researchers analyzing large language models find that loss scales inversely with network depth, suggesting most layers function similarly and reduce error through ensemble averaging rather than compositional learning. This inefficient scaling pattern may stem from architectural constraints in residual networks, indicating that improving LLM efficiency requires fundamental architectural innovations rather than simply adding more layers.

AIBullisharXiv – CS AI · Jun 16/10

🧠

PhyDrawGen: Physically Grounded Diagram Generation from Natural Language

PhyDrawGen is a neuro-symbolic AI system that generates physics diagrams from natural language text while maintaining strict physical accuracy. By combining large language models, deterministic solvers, and vision-language models in a pipeline, it overcomes the hallucination problems of current generative models and outperforms GPT-4, Gemini 2.5, and Gemini 3 Pro on physics problems spanning mechanics, optics, and electromagnetism.

🧠 GPT-5🧠 Gemini

Page 1 of 2Next →