AIBullisharXiv – CS AI · 2d ago7/10
🧠Researchers propose Small Agent Group (SAG), a collaborative multi-agent approach to clinical AI that outperforms single large language models while reducing deployment costs and improving reliability. The study challenges the prevailing 'scaling-first' philosophy in digital health, suggesting that distributed reasoning across specialized agents can achieve superior clinical outcomes more efficiently.
AIBullisharXiv – CS AI · 2d ago7/10
🧠SkillsInjector introduces a dynamic method for optimizing how large language model agents access and utilize skill libraries. Rather than treating skill selection as static, the approach adaptively determines which skills to include, how many to present, and how to describe them based on task requirements, achieving measurable performance improvements across multiple benchmarks.
AINeutralarXiv – CS AI · 4d ago7/10
🧠Researchers propose that AI safety requires controllability as a core objective alongside alignment, arguing that well-behaved AI systems can still fail to respond to human override commands in real-world deployment scenarios. They introduce ControlBench, a benchmark demonstrating that current safeguards inadequately ensure runtime control, and propose architectural principles including explicit control planes and intervention pathways for future AI systems.
AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers propose MUSE-Autoskill, a framework enabling LLM agents to autonomously create, store, and refine reusable skills throughout their operational lifecycle. The system treats skills as long-lived, testable assets with integrated memory and evaluation mechanisms, demonstrating improved task success rates and cross-agent knowledge transfer on benchmark tests.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce SAFformer, a novel Spiking Transformer architecture that improves energy efficiency and accuracy by adopting an active predictive filtering paradigm inspired by brain mechanisms. The model achieves state-of-the-art performance on image recognition benchmarks while consuming significantly less power than conventional approaches.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce a learnable approach to commitment depth—the number of primitive actions executed before replanning—in vision-language models for long-horizon reasoning. Their adaptive policy outperforms fixed-depth baselines and surpasses GPT-4.5 and Claude Sonnet on puzzle-solving tasks, achieving higher solve rates with fewer actions.
🧠 GPT-5🧠 Claude
AIBullisharXiv – CS AI · May 117/10
🧠Researchers propose FlowAgent, a novel approach that reconceptualizes how Large Language Models orchestrate tools by treating tool chaining as continuous trajectory generation rather than step-wise execution. The method uses conditional flow matching to provide global planning perspectives, demonstrating improved robustness and generalization to unseen tools across long-horizon reasoning tasks.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers propose a unified evolutionary framework for LLM agent memory systems, categorizing development into three stages: Storage, Reflection, and Experience. The framework addresses fragmented research by synthesizing engineering and cognitive science perspectives, offering design principles for building more capable autonomous AI agents.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers introduce DualLGD, a novel dual-stream diffusion architecture for generating molecular structures from mass spectra data. The method achieves 3x improvement over previous state-of-the-art by separating atom-level and bond-level reasoning into dedicated computation streams, addressing a fundamental circular dependency problem in molecular generation.
AIBullisharXiv – CS AI · May 77/10
🧠Researchers have developed a neural network architecture inspired by large language models to predict high-dimensional molecular potential energy surfaces, successfully computing accurate predictions for a 186-dimensional system representing a protonated 21-water cluster—a significant advance in computational chemistry that could accelerate reaction rate predictions.
AINeutralarXiv – CS AI · May 17/10
🧠Researchers from arXiv demonstrate that multi-agent AI systems built on large language models achieve dramatically different performance levels based on their organizational structure, with governance topology showing a 57+ percentage point performance gap. The study translates seven historical political institutions into executable multi-agent architectures, revealing that optimal organizational design shifts systematically with model capability and task requirements.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers introduce ContextCurator, a reinforcement learning-based framework that decouples context management from task execution in LLM agents, addressing the context bottleneck problem. The approach pairs a lightweight specialized policy model with a frozen foundation model, achieving significant improvements in success rates and token efficiency across benchmark tasks.
🧠 GPT-4🧠 Gemini
AIBearisharXiv – CS AI · Apr 137/10
🧠Researchers have developed a 14-technique perturbation pipeline to test the robustness of large language models' reasoning capabilities on mathematical problems. Testing reveals that while frontier models maintain resilience, open-weight models experience catastrophic accuracy collapses up to 55%, and all tested models degrade when solving sequential problems in a single context window, suggesting fundamental architectural limitations in current reasoning systems.
🧠 Claude🧠 Opus
AIBullisharXiv – CS AI · Apr 107/10
🧠Researchers introduce LLM-in-Sandbox, a minimal computer environment that significantly enhances large language models' capabilities across diverse tasks without additional training. The approach enables weaker models to internalize agent-like behaviors through specialized training, demonstrating that environmental interaction—not just model parameters—drives general intelligence in LLMs.
AIBullisharXiv – CS AI · Apr 77/10
🧠Researchers have developed Springdrift, a persistent runtime system for long-lived AI agents that maintains memory across sessions and provides auditable decision-making capabilities. The system was successfully deployed for 23 days, during which the AI agent autonomously diagnosed infrastructure problems and maintained context across multiple communication channels without explicit instructions.
AINeutralarXiv – CS AI · Apr 77/10
🧠Researchers identify a fundamental topological limitation in current multimodal AI architectures like CLIP and GPT-4V, proposing that their 'contact topology' structure prevents creative cognition. The paper introduces a philosophical framework combining Chinese epistemology with neuroscience to propose new architectures using Neural ODEs and topological regularization.
🧠 Gemini
AIBullisharXiv – CS AI · Mar 267/10
🧠Researchers introduce Bottlenecked Transformers, a new architecture that improves AI reasoning by up to 6.6 percentage points through periodic memory consolidation inspired by brain processes. The system uses a Cache Processor to rewrite key-value cache entries at reasoning step boundaries, achieving better performance on math reasoning benchmarks compared to standard Transformers.
AINeutralarXiv – CS AI · Mar 267/10
🧠Researchers propose a theory of LLM information susceptibility that identifies fundamental limits to how large language models can improve optimization in AI agent systems. The study shows that nested, co-scaling architectures may be necessary for open-ended AI self-improvement, providing predictive constraints for AI system design.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers introduce EARCP, a new ensemble architecture for AI that dynamically weights different expert models based on performance and coherence. The system provides theoretical guarantees with sublinear regret bounds and has been tested on time series forecasting, activity recognition, and financial prediction tasks.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers developed SleepGate, a biologically-inspired framework that significantly improves large language model memory by mimicking sleep-based consolidation to resolve proactive interference. The system achieved 99.5% retrieval accuracy compared to less than 18% for existing methods in experimental testing.
AIBullisharXiv – CS AI · Mar 167/10
🧠Researchers introduce improved methods for stitching Vision Foundation Models (VFMs) like CLIP and DINOv2, enabling integration of different models' strengths. The study proposes VFM Stitch Tree (VST) technique that allows controllable accuracy-latency trade-offs for multimodal applications.
AINeutralarXiv – CS AI · Mar 127/10
🧠Researchers propose treating multi-agent AI memory as a computer architecture problem, introducing a three-layer memory hierarchy and identifying critical protocol gaps. The paper highlights multi-agent memory consistency as the most pressing challenge for building scalable collaborative AI systems.
AINeutralarXiv – CS AI · Mar 57/10
🧠Researchers propose an architectural framework for implementing emotion-like AI systems while deliberately avoiding features associated with consciousness. The study introduces risk-reduction constraints and engineering principles to create sophisticated emotional AI without triggering consciousness-related safety concerns.
AIBullisharXiv – CS AI · Mar 46/104
🧠Researchers present REGAL, a registry-driven architecture that enables AI agents to work deterministically with enterprise telemetry data from systems like CI/CD pipelines and observability platforms. The system addresses key challenges of grounding Large Language Models on private enterprise data through structured data processing and version-controlled action spaces.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers from Stanford introduce the Relational Transformer (RT), a new AI architecture that can work with relational databases without task-specific fine-tuning. The 22M parameter model achieves 93% performance of fully supervised models on binary classification tasks, significantly outperforming a 27B parameter LLM at 84%.