AIBullisharXiv – CS AI · 10h ago7/10
🧠GSAM is a new robotic framework that improves articulated object manipulation through vision-based perception, VLM-based refinement with commonsense reasoning, and constraint-based planning to prevent collisions. In experiments across 50 hinge tasks, GSAM achieved 36% higher success rates and 3.1% lower standard deviation compared to existing baselines, demonstrating superior generalization and safety.
AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers introduce VikingMem, a memory management system for long-term LLM interactions that addresses context window limitations through selective memory extraction, stateful evolution, and temporal weighting. The system demonstrates 30% improvements in memory retrieval effectiveness while maintaining low latency, offering a generalizable solution across diverse applications beyond traditional chatbots.
AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers introduce proof-state snapshotting, a technique that accelerates automated theorem proving in Lean 4 by reusing elaborated proof states across parallel search branches instead of reconstructing them. The method achieves 5.6-50x speedups (averaging 14x) on benchmark problems, addressing a critical bottleneck where per-branch overhead from import loading and elaboration consumed over 99% of computation time.
AIBullisharXiv – CS AI · 5d ago7/10
🧠Researchers propose MUSE-Autoskill, a framework enabling LLM agents to autonomously create, store, and refine reusable skills throughout their operational lifecycle. The system treats skills as long-lived, testable assets with integrated memory and evaluation mechanisms, demonstrating improved task success rates and cross-agent knowledge transfer on benchmark tests.
AIBullisharXiv – CS AI · May 127/10
🧠SkillEvolver introduces a meta-learning framework that automatically improves AI agent skills through iterative refinement based on real-world deployment failures, achieving 56.8% accuracy on benchmark tasks compared to 43.6% for manually curated skills. The system learns by modifying skill prose and code rather than model weights, enabling seamless integration with any compatible agent without retraining.
AINeutralarXiv – CS AI · May 97/10
🧠A new research paper identifies authorization propagation as a critical but underexplored security problem in multi-agent AI systems, distinct from prompt injection vulnerabilities. The paper argues that identity governance must become foundational infrastructure in AI orchestration, with seven structural requirements for maintaining authorization invariants across distributed agent interactions.
AIBullisharXiv – CS AI · Apr 207/10
🧠Researchers propose a bilevel optimization framework using Monte Carlo Tree Search to systematically improve LLM agent skills—structured collections of instructions, tools, and resources. The framework optimizes both skill structure and component content simultaneously, demonstrating performance improvements on Operations Research tasks and addressing a previously unsolved challenge in agent design optimization.
AINeutralarXiv – CS AI · Apr 147/10
🧠A comprehensive comparative study traces the evolution of OpenAI's GPT models from GPT-3 through GPT-5, revealing that successive generations represent far more than incremental capability improvements. The research demonstrates a fundamental shift from simple text predictors to integrated, multimodal systems with tool access and workflow capabilities, while persistent limitations like hallucination and benchmark fragility remain largely unresolved across all versions.
🧠 GPT-4🧠 GPT-5
AIBullisharXiv – CS AI · Apr 67/10
🧠Researchers have developed Glia, an AI architecture using large language models in a multi-agent workflow to autonomously design computer systems mechanisms. The system generates interpretable designs for distributed GPU clusters that match human expert performance while providing novel insights into workload behavior.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers introduce StatePlane, a model-agnostic cognitive state management system that enables AI systems to maintain coherent reasoning over long interaction horizons without expanding context windows or retraining models. The system uses episodic, semantic, and procedural memory mechanisms inspired by cognitive psychology to overcome current limitations in large language models.
AINeutralarXiv – CS AI · Mar 127/10
🧠A comprehensive study analyzing 896 academic papers and 80+ regulatory documents reveals critical ambiguities in how 'AI models' and 'AI systems' are defined across regulations like the EU AI Act. The research proposes clear operational definitions to resolve regulatory boundary problems that complicate responsibility allocation across the AI value chain.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers developed KernelSkill, a multi-agent framework that optimizes GPU kernel performance using expert knowledge rather than trial-and-error approaches. The system achieved 100% success rates and significant speedups (1.92x to 5.44x) over existing methods, addressing a critical bottleneck in AI system efficiency.
AIBullishOpenAI News · Mar 97/10
🧠OpenAI is acquiring Promptfoo, an AI security platform that specializes in helping enterprises identify and fix vulnerabilities in AI systems during the development process. This acquisition strengthens OpenAI's security capabilities and enterprise offerings.
🏢 OpenAI
AIBullisharXiv – CS AI · Mar 67/10
🧠Researchers introduce AMV-L, a new memory management framework for long-running LLM systems that uses utility-based lifecycle management instead of traditional time-based retention. The system improves throughput by 3.1x and reduces latency by up to 4.7x while maintaining retrieval quality by controlling memory working-set size rather than just retention time.
AIBullisharXiv – CS AI · Mar 47/103
🧠Researchers propose a framework for sustainable AI self-evolution through triadic roles (Proposer, Solver, Verifier) that ensures learnable information gain across iterations. The study identifies three key system designs to prevent the common plateau effect in self-play AI systems: asymmetric co-evolution, capacity growth, and proactive information seeking.
AIBullisharXiv – CS AI · Mar 46/104
🧠Researchers introduce MASPOB, a bandit-based framework that optimizes prompts for Multi-Agent Systems using Graph Neural Networks to handle topology-induced coupling. The system reduces search complexity from exponential to linear while achieving state-of-the-art performance across benchmarks.
AIBullisharXiv – CS AI · 10h ago6/10
🧠Researchers introduce Iterative Regret-Minimization Fine-Tuning (Iterative RMFT), a post-training method that improves LLMs' decision-making capabilities by iteratively distilling low-regret trajectories back into models. The approach addresses fundamental limitations in how LLMs handle online decision problems without relying on rigid algorithmic templates, demonstrating improvements across multiple model architectures.
🧠 GPT-4
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers introduce TADDLE, an AI system that detects quality deficiencies in LLM-generated peer reviews by decomposing analysis into specialized tools and multi-label classification. The work addresses a growing problem in academic publishing where AI-written reviews are fluent but potentially flawed, backed by the first expert-annotated benchmark of 1,800 reviews across six defect categories.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce MemoRepair, a system that addresses cascade failures in agentic memory by preventing stale or invalidated information from corrupting downstream AI agent decisions. Using a barrier-first approach and graph-based optimization, the system reduces invalid memory exposure from 69-94% to 0% while maintaining 91-94% of valid successor states with significantly lower repair costs.
AINeutralAI News · Apr 146/10
🧠Hyundai Motor Group is pivoting toward physical AI systems, integrating artificial intelligence into robots and machinery designed to operate in real-world environments. The company's current focus centers on factory and industrial applications, signaling a major shift in how the automotive giant approaches automation and manufacturing technology.
AINeutralarXiv – CS AI · Apr 146/10
🧠A new benchmark study (RAGSearch) evaluates whether agentic search systems can reduce the need for expensive GraphRAG pipelines by dynamically retrieving information across multiple rounds. Results show agentic search significantly improves standard RAG performance and narrows the gap to GraphRAG, though GraphRAG retains advantages for complex multi-hop reasoning tasks when preprocessing costs are considered.
🏢 Meta
AIBullisharXiv – CS AI · Mar 276/10
🧠Researchers developed a novel Co-Regulation Design Agentic Loop (CRDAL) system that uses metacognitive agents to improve AI-driven engineering design by reducing design fixation. The system showed better performance than traditional approaches in battery pack design tasks without significantly increasing computational costs.
AINeutralarXiv – CS AI · Mar 166/10
🧠Researchers developed a new method to evaluate AI ethical reasoning using literary narratives from science fiction, testing 13 AI systems across 24 conditions. The study found that current AI systems perform surface-level ethical responses rather than genuine moral reasoning, with more sophisticated systems showing more complex failure modes.
🏢 Anthropic🏢 Microsoft🧠 Claude
AIBullishMarkTechPost · Mar 116/10
🧠This tutorial demonstrates building a Meta-Agent system that automatically designs and instantiates task-specific AI agents from simple descriptions. The system dynamically analyzes tasks, selects appropriate tools, configures memory architecture and planners, then creates fully functional agent runtimes without relying on static templates.
AIBullisharXiv – CS AI · Mar 37/109
🧠NeuroHex introduces a hexagonal coordinate system inspired by human brain grid cells to create highly efficient world models for adaptive AI systems. The framework achieves 90-99% reduction in geometric complexity while processing real-world map data, offering significant improvements for autonomous AI spatial reasoning and navigation.