AIBullisharXiv – CS AI · May 77/10
🧠Researchers present CTM-AI, a general-purpose AI architecture combining the Conscious Turing Machine model with modern foundation models to achieve human-like flexibility across tasks. The system demonstrates state-of-the-art performance on multimodal benchmarks and tool-using tasks, suggesting that consciousness-inspired architectures may offer a path toward more capable and adaptable AI systems.
AINeutralarXiv – CS AI · Apr 207/10
🧠A new survey examines intrinsic interpretability approaches for Large Language Models, categorizing design methods that build transparency directly into model architectures rather than applying post-hoc explanations. The research identifies five key paradigms—functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction—addressing the critical challenge of making LLMs more trustworthy and safer for deployment.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers demonstrate that padded transformers maintain consistent computational expressivity across various architectural choices, with numeric precision and model depth emerging as the primary factors determining capability. The findings establish formal equivalences between transformer models and circuit complexity classes, suggesting practical transformer designs are more robust than previously understood.
AIBullisharXiv – CS AI · 6d ago6/10
🧠Researchers introduce Parallax, a scalable Local Linear Attention mechanism that improves upon traditional softmax attention in large language models by learning query-like projectors to probe key-value covariance. Pretraining experiments at 0.6B and 1.7B parameters demonstrate consistent perplexity improvements and downstream benchmark gains, with performance matching or exceeding FlashAttention while revealing novel architecture-optimizer codesign benefits with the Muon optimizer.
🏢 Perplexity
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce Causal Energy Minimization (CEM), a theoretical framework that reinterprets Transformer layer architecture through energy-based optimization principles. The approach derives weight-tied attention and gated MLPs as gradient updates on energy functions, revealing new design spaces for parameter-efficient Transformer variants that maintain baseline performance at hundred-million-parameter scales.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers conducted a systematic study comparing Vision-Language Models built with LLAMA-1, LLAMA-2, and LLAMA-3 backbones, finding that newer LLM architectures don't universally improve VLM performance and instead show task-dependent benefits. The findings reveal that performance gains vary significantly: visual question-answering tasks benefit from improved reasoning in newer models, while vision-heavy tasks see minimal gains from upgraded language backbones.
AIBullisharXiv – CS AI · Mar 55/10
🧠Researchers propose JPmHC (Jacobian-spectrum Preserving manifold-constrained Hyper-Connections), a new deep learning framework that improves upon existing Hyper-Connections by replacing identity skips with trainable linear mixers while controlling gradient conditioning. The framework addresses training instability and memory overhead issues in current deep learning architectures through constrained optimization on specific mathematical manifolds.
AIBullishLil'Log (Lilian Weng) · Aug 66/10
🧠Neural Architecture Search (NAS) automates the design of neural network architectures to find optimal topologies for specific tasks. The approach systematically explores network architecture spaces through three key components: search space, search algorithms, and child model evolution strategies, potentially discovering better performing models than human-designed architectures.
AINeutralarXiv – CS AI · May 124/10
🧠S2P-Net introduces a compact deep learning architecture designed to achieve rotation-invariant object recognition without requiring data augmentation, with comparisons to traditional CNN approaches. This appears to be an early-stage academic work focused on improving neural network efficiency in low-data scenarios.
AINeutralarXiv – CS AI · Mar 275/10
🧠Researchers developed NERO-Net, a neuroevolutionary approach to design convolutional neural networks with inherent resistance to adversarial attacks without requiring robust training methods. The evolved architecture achieved 47% adversarial accuracy and 93% clean accuracy on CIFAR-10, demonstrating that architectural design can provide intrinsic robustness against adversarial examples.