y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm-architecture News & Analysis

35 articles tagged with #llm-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

35 articles
AIBullisharXiv – CS AI · 1d ago7/10
🧠

MatMind: A Structure-Activity Knowledge-Driven Generative Foundation Model for Materials Science

MatMind is a generative foundation model designed for crystal materials science that unifies structure prediction, property forecasting, and material design within a single LLM-based framework. The model surpasses specialized graph neural networks on benchmark tasks while achieving 65.3% success on crystal generation, demonstrating that unified AI architectures can compete with purpose-built narrow specialists.

AIBullisharXiv – CS AI · 5d ago7/10
🧠

Microskill Architecture: A Modular Skill-Driven Framework for AI-Native Code Generation

Researchers introduce MicroSkill Architecture, a modular framework that organizes AI coding knowledge into atomic skill capsules rather than feeding entire codebases to language models. The approach reduces token consumption by 90%, doubles compilation success rates, and eliminates architectural violations in enterprise systems.

AINeutralarXiv – CS AI · Jun 27/10
🧠

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Researchers establish fundamental information-theoretic limits on decoder-only transformer attention for state-tracking tasks, proving extended reasoning degrades performance beyond a 'Deterministic Horizon' of 19-31 steps. Tool delegation consistently outperforms neural chain-of-thought across 12 models (86-94% vs 24-42% accuracy), suggesting hybrid agentic systems require external tools rather than pure neural reasoning for complex deterministic tasks.

AIBullisharXiv – CS AI · Jun 27/10
🧠

MemPro: Agentic Memory Systems as Evolvable Programs

Researchers introduce MemPro, an evolution framework that treats autonomous agent memory systems as adaptable programs rather than static pipelines. By iteratively diagnosing failures and refining the entire memory-construction-retrieval pipeline, MemPro outperforms fixed baselines on multiple benchmarks while maintaining computational efficiency.

AIBullisharXiv – CS AI · Jun 27/10
🧠

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Researchers propose the Intelligent Computing Architecture Model (ICAM), a six-layer framework that applies classical computer architecture principles to large language models and agentic AI systems. The paper maps recurring engineering challenges—cache reuse, context management, agent scheduling, and permission control—to traditional systems problems, introducing three design laws to optimize model-native computing efficiency and coordination.

🧠 Claude
AIBullisharXiv – CS AI · May 287/10
🧠

Periodic RoPE for Infinite Context LLMs

Researchers propose Periodic RoPE (P-RoPE), a novel positional encoding mechanism that combines sliding window attention for local dependencies with global attention layers lacking positional constraints, enabling language models to theoretically support infinite context windows without performance degradation. The approach addresses a fundamental limitation in current LLMs where model performance degrades when sequence length exceeds the pre-trained range of positional encodings like RoPE.

AIBearisharXiv – CS AI · May 127/10
🧠

Position: Avoid Overstretching LLMs for every Enterprise Task

A new research position argues that enterprises should stop treating large language models as monolithic solutions for all tasks and instead use them primarily for structured data extraction within modular architectures. The paper contends that LLMs have inherent capacity limits for enterprise knowledge needs and proposes delegating computation and storage to specialized components like knowledge bases and symbolic systems for better reliability and cost efficiency.

AIBullisharXiv – CS AI · May 127/10
🧠

The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents

Researchers propose Agent Cybernetics, a theoretical framework applying mid-20th century control systems theory to modern LLM-based AI agents. The framework addresses critical gaps in how foundation agents are designed, offering scientific principles for reliability, continuous operation, and safe self-improvement across long-horizon tasks.

AIBullisharXiv – CS AI · May 127/10
🧠

Continuous Latent Contexts Enable Efficient Online Learning in Transformers

Researchers demonstrate that transformer models equipped with continuous latent context tokens can efficiently implement online learning algorithms without parameter updates. A small GPT-2-style model trained with this approach outperforms much larger language models on synthetic online prediction tasks, suggesting a promising architectural direction for adaptive AI systems.

AIBullisharXiv – CS AI · May 77/10
🧠

LCM: Lossless Context Management

Researchers introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks up to 1M tokens. LCM combines recursive context compression with engine-managed task partitioning, representing an evolution of recursive language models that prioritizes reliability and state retrievability over flexibility.

🧠 Claude🧠 Opus
AIBearisharXiv – CS AI · May 47/10
🧠

Language Models Struggle to Use Representations Learned In-Context

A new research study reveals that large language models struggle to effectively use representations they learn from in-context information, even though they can encode this information internally. The findings suggest current LLMs have fundamental limitations in adapting to novel contexts, affecting their ability to generalize learned patterns to downstream tasks.

AIBullisharXiv – CS AI · Apr 207/10
🧠

CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling

Researchers introduce CoMeT (Collaborative Memory Transformer), a novel architecture that enables large language models to process arbitrarily long sequences with constant memory usage and linear time complexity. The system uses a dual-memory approach with FIFO queues and gated updates, demonstrating remarkable performance on long-context tasks including 1M token sequences and real-world applications.

AINeutralarXiv – CS AI · 11h ago6/10
🧠

Whisper-GPT -- Continuous Discrete Hybrid Representation Language Models For Speech And Music

Researchers introduce Whisper-GPT, a hybrid language model that combines continuous audio representations (spectrograms) with discrete acoustic tokens to improve speech and music generation. This approach addresses context length limitations in traditional token-based models while maintaining high-fidelity audio synthesis capabilities.

🏢 Perplexity
AINeutralarXiv – CS AI · 11h ago6/10
🧠

Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models

Researchers introduce UniTok, a universal tokenizer that converts continuous time series data into discrete tokens, enabling UniTok-FM—a foundation model pretrained via next-token prediction. This unified approach supports forecasting, generation, and classification tasks without task-specific modifications, achieving competitive performance with specialized models while enabling zero-shot and few-shot inference capabilities.

AINeutralarXiv – CS AI · 1d ago6/10
🧠

Reachability and asymptotics of Gaussian Transformer dynamics

Researchers have formulated Transformer data propagation as a nonlinear control system and proven that Gaussian distributions remain Gaussian through the network's layers. This reduces infinite-dimensional dynamics to finite-dimensional equations governing mean and covariance evolution, connecting Transformer expressiveness to classical control theory and revealing conditions for stability or divergence.

AIBullisharXiv – CS AI · 1d ago6/10
🧠

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Researchers present SearchSwarm, a framework that trains large language models to intelligently delegate complex tasks to subagents while managing finite context windows. The resulting 30B-parameter model achieves state-of-the-art performance on research benchmarks by learning when and what to delegate, addressing a critical bottleneck in agentic AI systems.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Declarative Skills for AI Agents in Knowledge-Grounded Tool-Use Workflows

Researchers compare three orchestration approaches for AI agents handling customer-service workflows: declarative agents using natural-language skill files, imperative agents with programmatic state machines, and unscaffolded baseline agents. The study finds that retrieval quality is the dominant bottleneck, and declarative skills improve performance on procedural tasks only when evidence quality is high.

AINeutralarXiv – CS AI · Jun 26/10
🧠

Inverse Depth Scaling From Most Layers Being Similar

Researchers analyzing large language models find that loss scales inversely with network depth, suggesting most layers function similarly and reduce error through ensemble averaging rather than compositional learning. This inefficient scaling pattern may stem from architectural constraints in residual networks, indicating that improving LLM efficiency requires fundamental architectural innovations rather than simply adding more layers.

AIBullisharXiv – CS AI · Jun 16/10
🧠

PhyDrawGen: Physically Grounded Diagram Generation from Natural Language

PhyDrawGen is a neuro-symbolic AI system that generates physics diagrams from natural language text while maintaining strict physical accuracy. By combining large language models, deterministic solvers, and vision-language models in a pipeline, it overcomes the hallucination problems of current generative models and outperforms GPT-4, Gemini 2.5, and Gemini 3 Pro on physics problems spanning mechanics, optics, and electromagnetism.

🧠 GPT-5🧠 Gemini
AIBullisharXiv – CS AI · Jun 16/10
🧠

LLMs Without Deep Neural Networks: New Architecture, Benefits and Case Study

Researchers have developed an alternative to deep neural networks for large language models based on RBF (Radial Basis Function) networks that claims to find optimal solutions in closed form without iterative training. The approach promises improved explainability and accuracy while eliminating the computationally expensive training process required by traditional DNNs.

AINeutralarXiv – CS AI · May 296/10
🧠

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Researchers introduce Think Fast, Talk Smart, a hybrid system that combines deterministic computation with bounded LLM calls for generating health text from structured data. The approach achieves lower errors and costs than pure LLM-based alternatives by reserving neural computation for expression tasks while delegating analysis, comparison, and ranking to deterministic code.

AINeutralarXiv – CS AI · May 296/10
🧠

Modularizing Educational LLM-Agency for Fostering Responsible Learning Assistance

Researchers propose a modular architecture for educational AI chatbots designed to enforce pedagogical principles and prevent negative learning outcomes. The approach addresses structural limitations in current monolithic LLM solutions by incorporating targeted modules at different exercise-solving stages, enabling more transparent and controlled student guidance.

AINeutralarXiv – CS AI · May 286/10
🧠

Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

Researchers present a modular LLM-based architecture for detecting and quantifying human values in text, addressing the need for ethical decision-making in autonomous AI systems. The approach separates value conceptualization from detection, enabling scalable application across different ethical frameworks and demonstrating strong performance on the ValueEval dataset.

AIBullisharXiv – CS AI · May 286/10
🧠

HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling

Researchers introduce HGMem, a hypergraph-based working memory system that enhances multi-step retrieval-augmented generation (RAG) for large language models by modeling complex relational dependencies among facts. Unlike traditional RAG systems that treat memory as passive storage, HGMem dynamically structures information as interconnected high-order relationships, demonstrating improved performance on global sense-making benchmarks requiring complex reasoning across extended contexts.

AINeutralarXiv – CS AI · May 126/10
🧠

DARE: Diffusion Language Model Activation Reuse for Efficient Inference

Researchers introduce DARE, a technique that reduces computational redundancy in Diffusion Language Models by reusing cached attention activations across tokens. The method achieves up to 1.20x per-layer latency improvements while maintaining generation quality, addressing efficiency gaps between diffusion-based and auto-regressive language models.

Page 1 of 2Next →