#llm-architecture News & Analysis

22 articles tagged with #llm-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

22 articles

AIBullisharXiv – CS AI · 3d ago7/10

🧠

Periodic RoPE for Infinite Context LLMs

Researchers propose Periodic RoPE (P-RoPE), a novel positional encoding mechanism that combines sliding window attention for local dependencies with global attention layers lacking positional constraints, enabling language models to theoretically support infinite context windows without performance degradation. The approach addresses a fundamental limitation in current LLMs where model performance degrades when sequence length exceeds the pre-trained range of positional encodings like RoPE.

AIBearisharXiv – CS AI · May 127/10

🧠

Position: Avoid Overstretching LLMs for every Enterprise Task

A new research position argues that enterprises should stop treating large language models as monolithic solutions for all tasks and instead use them primarily for structured data extraction within modular architectures. The paper contends that LLMs have inherent capacity limits for enterprise knowledge needs and proposes delegating computation and storage to specialized components like knowledge bases and symbolic systems for better reliability and cost efficiency.

AIBullisharXiv – CS AI · May 127/10

🧠

The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents

Researchers propose Agent Cybernetics, a theoretical framework applying mid-20th century control systems theory to modern LLM-based AI agents. The framework addresses critical gaps in how foundation agents are designed, offering scientific principles for reliability, continuous operation, and safe self-improvement across long-horizon tasks.

AIBullisharXiv – CS AI · May 127/10

🧠

Continuous Latent Contexts Enable Efficient Online Learning in Transformers

Researchers demonstrate that transformer models equipped with continuous latent context tokens can efficiently implement online learning algorithms without parameter updates. A small GPT-2-style model trained with this approach outperforms much larger language models on synthetic online prediction tasks, suggesting a promising architectural direction for adaptive AI systems.

AIBullisharXiv – CS AI · May 77/10

🧠

LCM: Lossless Context Management

Researchers introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks up to 1M tokens. LCM combines recursive context compression with engine-managed task partitioning, representing an evolution of recursive language models that prioritizes reliability and state retrievability over flexibility.

🧠 Claude🧠 Opus

AIBearisharXiv – CS AI · May 47/10

🧠

Language Models Struggle to Use Representations Learned In-Context

A new research study reveals that large language models struggle to effectively use representations they learn from in-context information, even though they can encode this information internally. The findings suggest current LLMs have fundamental limitations in adapting to novel contexts, affecting their ability to generalize learned patterns to downstream tasks.

AIBullisharXiv – CS AI · Apr 207/10

🧠

CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling

Researchers introduce CoMeT (Collaborative Memory Transformer), a novel architecture that enables large language models to process arbitrarily long sequences with constant memory usage and linear time complexity. The system uses a dual-memory approach with FIFO queues and gated updates, demonstrating remarkable performance on long-context tasks including 1M token sequences and real-world applications.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Researchers introduce Think Fast, Talk Smart, a hybrid system that combines deterministic computation with bounded LLM calls for generating health text from structured data. The approach achieves lower errors and costs than pure LLM-based alternatives by reserving neural computation for expression tasks while delegating analysis, comparison, and ranking to deterministic code.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

Modularizing Educational LLM-Agency for Fostering Responsible Learning Assistance

Researchers propose a modular architecture for educational AI chatbots designed to enforce pedagogical principles and prevent negative learning outcomes. The approach addresses structural limitations in current monolithic LLM solutions by incorporating targeted modules at different exercise-solving stages, enabling more transparent and controlled student guidance.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

Researchers present a modular LLM-based architecture for detecting and quantifying human values in text, addressing the need for ethical decision-making in autonomous AI systems. The approach separates value conceptualization from detection, enabling scalable application across different ethical frameworks and demonstrating strong performance on the ValueEval dataset.

AIBullisharXiv – CS AI · 3d ago6/10

🧠

HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling

Researchers introduce HGMem, a hypergraph-based working memory system that enhances multi-step retrieval-augmented generation (RAG) for large language models by modeling complex relational dependencies among facts. Unlike traditional RAG systems that treat memory as passive storage, HGMem dynamically structures information as interconnected high-order relationships, demonstrating improved performance on global sense-making benchmarks requiring complex reasoning across extended contexts.

AINeutralarXiv – CS AI · May 126/10

🧠

DARE: Diffusion Language Model Activation Reuse for Efficient Inference

Researchers introduce DARE, a technique that reduces computational redundancy in Diffusion Language Models by reusing cached attention activations across tokens. The method achieves up to 1.20x per-layer latency improvements while maintaining generation quality, addressing efficiency gaps between diffusion-based and auto-regressive language models.

AINeutralarXiv – CS AI · May 126/10

🧠

LLM Translation of Compiler Intermediate Representation

Researchers introduce IRIS-14B, a 14-billion-parameter LLM fine-tuned to translate compiler intermediate representations between GCC's GIMPLE and LLVM IR, achieving up to 44 percentage points higher accuracy than existing state-of-the-art models. The approach demonstrates how LLMs can function as interoperability layers in hybrid compiler architectures, enabling cross-toolchain workflows without modifying existing compiler infrastructure.

AINeutralarXiv – CS AI · May 126/10

🧠

Hierarchical Mixture-of-Experts with Two-Stage Optimization

Researchers introduce Hi-MoE, a hierarchical Mixture-of-Experts framework that addresses a fundamental routing trade-off in sparse MoE models by implementing two-stage optimization: inter-group load balancing and intra-group expert specialization. Tested on large-scale NLP and vision tasks, Hi-MoE achieves 5.6% perplexity improvements and superior expert balance compared to existing methods.

🏢 Meta🏢 Perplexity

AINeutralarXiv – CS AI · May 116/10

🧠

Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding

Response-G1 introduces a novel framework for real-time video understanding that uses explicit scene graphs to align video evidence with query-specific response conditions, enabling Video-LLMs to make more accurate timing decisions during streaming video analysis without requiring fine-tuning.

AINeutralarXiv – CS AI · May 116/10

🧠

Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization

Researchers present a novel logical framework for understanding encoder-decoder transformers using temporal logic extended with counting and past modalities. The work provides theoretical foundations for how these architectures process information across attention mechanisms, with implications for LLM interpretability and design.

AINeutralarXiv – CS AI · May 76/10

🧠

Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation

Researchers reveal that large language models develop distinct hierarchical processing stages (Local, Intermediate, Global) determined by architecture family rather than model size. Using information theory, they demonstrate that Llama and Qwen models show dramatically different brittleness patterns across layers, with architectural design — not scaling — as the primary driver of model behavior.

🧠 Llama

AINeutralarXiv – CS AI · May 16/10

🧠

When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks

Researchers demonstrate that Large Language Models perform significantly better on 2D structured tasks when given visual representations rather than serialized text inputs. The study reveals that converting 2D data into 1D token sequences creates representational friction that degrades model performance, with gaps widening as task complexity increases.

AINeutralarXiv – CS AI · May 16/10

🧠

TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents

Researchers introduce TiMem, a temporal-hierarchical memory framework that helps conversational AI agents manage long conversation histories beyond LLM context limits. The system organizes interactions through a Temporal Memory Tree, achieving state-of-the-art performance on memory recall benchmarks while reducing memory overhead by over 50%.

AINeutralarXiv – CS AI · Apr 156/10

🧠

EMBER: Autonomous Cognitive Behaviour from Learned Spiking Neural Network Dynamics in a Hybrid LLM Architecture

Researchers present EMBER, a hybrid architecture combining spiking neural networks with large language models where the SNN acts as a persistent, biologically-inspired memory substrate that autonomously triggers LLM reasoning. The system demonstrates emergent autonomous behavior, initiating unprompted user contact after learning associations during idle periods, suggesting a fundamental shift in how AI systems could coordinate cognition and action.

AINeutralarXiv – CS AI · Apr 106/10

🧠

SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems

SymptomWise introduces a deterministic reasoning framework that separates language understanding from diagnostic inference in AI-driven medical systems, combining expert-curated knowledge with constrained LLM use to improve reliability and reduce hallucinations. The system achieved 88% accuracy in placing correct diagnoses in top-five differentials on challenging pediatric neurology cases, demonstrating how structured approaches can enhance AI safety in critical domains.

AINeutralarXiv – CS AI · Mar 26/1015

🧠

Understanding In-Context Learning Beyond Transformers: An Investigation of State Space and Hybrid Architectures

Researchers conducted an in-depth analysis of in-context learning capabilities across different AI architectures including transformers, state-space models, and hybrid systems. The study reveals that while these models perform similarly on tasks, their internal mechanisms differ significantly, with function vectors playing key roles in self-attention and Mamba layers.