#architecture-design News & Analysis

17 articles tagged with #architecture-design. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles

AINeutralarXiv – CS AI · Jun 257/10

🧠

Position: Reasoning After Perception Means Reasoning Without Vision

Researchers challenge the assumption that language reasoning can compensate for vision-language model weaknesses, arguing that deferring visual reasoning to text collapses spatial information and degrades perception to passive encoding. The study introduces the Turing Eye Test to demonstrate tasks requiring visual reasoning in pixel space cannot be solved through text-only reasoning alone, suggesting AI architectures must shift toward reasoning within perception rather than about it.

AIBullisharXiv – CS AI · May 77/10

🧠

CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness

Researchers present CTM-AI, a general-purpose AI architecture combining the Conscious Turing Machine model with modern foundation models to achieve human-like flexibility across tasks. The system demonstrates state-of-the-art performance on multimodal benchmarks and tool-using tasks, suggesting that consciousness-inspired architectures may offer a path toward more capable and adaptable AI systems.

AINeutralarXiv – CS AI · Apr 207/10

🧠

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures

A new survey examines intrinsic interpretability approaches for Large Language Models, categorizing design methods that build transparency directly into model architectures rather than applying post-hoc explanations. The research identifies five key paradigms—functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction—addressing the critical challenge of making LLMs more trustworthy and safer for deployment.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Repeated Shared Access Enables Grokking, but Edit Propagation Depends on a Fine-Grained Addressable Memory

Researchers compare four neural network architectures for factual knowledge propagation in question-answering systems, finding that repeated shared memory access enables out-of-distribution generalization ('grokking'), but only architectures with fine-grained addressable memory can effectively propagate edited facts. The study dissociates learning capability from editing affordance, revealing that looped computation and explicit memory mechanisms serve different functional purposes.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Protocol-Aware Tokenization and Architecture Co-Design for Wireless Packet Foundation Models

Researchers demonstrate that protocol-aware tokenization is significantly more important than model architecture for wireless packet foundation models. PLUME-DEEP achieves 98.2% accuracy with deeper layers, while PLUME-MAMBA offers faster inference with 96.1% accuracy, revealing that tokenizer design swings accuracy by 32 points versus only 2 points for architectural changes.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Gated MLPs as Symmetry-Broken Rank-1 Bilinear Attention

Researchers demonstrate that gated MLPs can be mathematically understood as rank-1 approximations to bilinear attention mechanisms, with nonlinearity placement breaking symmetry properties. This theoretical framework provides new insight into why gated MLPs perform effectively in practice and offers guidance for designing improved neural network architectures.

AIBullisharXiv – CS AI · Jun 96/10

🧠

MOSS-Video-Preview: Toward Real-Time Video Understanding via Cross-Attention

Researchers introduce MOSS-Video-Preview, a cross-attention architecture enabling real-time video understanding where models process frames continuously and revise answers as new information arrives. The approach achieves 5x speedup in time-to-first-token and 2.7x higher decoding throughput compared to decoder-only models, while maintaining competitive offline performance.

AINeutralarXiv – CS AI · Jun 86/10

🧠

An Abstract Architecture for Explainable Autonomy in Hazardous Environments

Researchers present an abstract architecture for building autonomous robotic systems that can explain their decision-making processes to human operators and regulators. The framework addresses the critical need for explainability in autonomous systems deployed in hazardous environments, with a practical application example in nuclear industry operations where trust and regulatory compliance are essential.

AIBullisharXiv – CS AI · Jun 56/10

🧠

Statistical Priors for Implicit Preferences: Decoupling Skill Selection as a Local Harness in Personal Agents

Researchers propose a decoupled architecture for personal AI agents that separates statistical preference learning from semantic intent parsing, enabling lightweight local deployment. The approach uses localized statistical data to modulate remote LLM skill selection decisions, achieving lower regret and higher accuracy than traditional memory-augmented agents.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

Researchers demonstrate that padded transformers maintain consistent computational expressivity across various architectural choices, with numeric precision and model depth emerging as the primary factors determining capability. The findings establish formal equivalences between transformer models and circuit complexity classes, suggesting practical transformer designs are more robust than previously understood.

AIBullisharXiv – CS AI · May 296/10

🧠

Parallax: Parameterized Local Linear Attention for Language Modeling

Researchers introduce Parallax, a scalable Local Linear Attention mechanism that improves upon traditional softmax attention in large language models by learning query-like projectors to probe key-value covariance. Pretraining experiments at 0.6B and 1.7B parameters demonstrate consistent perplexity improvements and downstream benchmark gains, with performance matching or exceeding FlashAttention while revealing novel architecture-optimizer codesign benefits with the Muon optimizer.

🏢 Perplexity

AINeutralarXiv – CS AI · May 116/10

🧠

Revisiting Transformer Layer Parameterization Through Causal Energy Minimization

Researchers introduce Causal Energy Minimization (CEM), a theoretical framework that reinterprets Transformer layer architecture through energy-based optimization principles. The approach derives weight-tied attention and gated MLPs as gradient updates on energy functions, revealing new design spaces for parameter-efficient Transformer variants that maintain baseline performance at hundred-million-parameter scales.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models

Researchers conducted a systematic study comparing Vision-Language Models built with LLAMA-1, LLAMA-2, and LLAMA-3 backbones, finding that newer LLM architectures don't universally improve VLM performance and instead show task-dependent benefits. The findings reveal that performance gains vary significantly: visual question-answering tasks benefit from improved reasoning in newer models, while vision-heavy tasks see minimal gains from upgraded language backbones.

AIBullisharXiv – CS AI · Mar 55/10

🧠

JPmHC Dynamical Isometry via Orthogonal Hyper-Connections

Researchers propose JPmHC (Jacobian-spectrum Preserving manifold-constrained Hyper-Connections), a new deep learning framework that improves upon existing Hyper-Connections by replacing identity skips with trainable linear mixers while controlling gradient conditioning. The framework addresses training instability and memory overhead issues in current deep learning architectures through constrained optimization on specific mathematical manifolds.

AIBullishLil'Log (Lilian Weng) · Aug 66/10

🧠

Neural Architecture Search

Neural Architecture Search (NAS) automates the design of neural network architectures to find optimal topologies for specific tasks. The approach systematically explores network architecture spaces through three key components: search space, search algorithms, and child model evolution strategies, potentially discovering better performing models than human-designed architectures.

AINeutralarXiv – CS AI · May 124/10

🧠

S2P-Net: A Spectral-Spatial Polar Network for Rotation-Invariant Object Recognition in Low-Data Regimes

S2P-Net introduces a compact deep learning architecture designed to achieve rotation-invariant object recognition without requiring data augmentation, with comparisons to traditional CNN approaches. This appears to be an early-stage academic work focused on improving neural network efficiency in low-data scenarios.

AINeutralarXiv – CS AI · Mar 275/10

🧠

NERO-Net: A Neuroevolutionary Approach for the Design of Adversarially Robust CNNs

Researchers developed NERO-Net, a neuroevolutionary approach to design convolutional neural networks with inherent resistance to adversarial attacks without requiring robust training methods. The evolved architecture achieved 47% adversarial accuracy and 93% clean accuracy on CIFAR-10, demonstrating that architectural design can provide intrinsic robustness against adversarial examples.