AIBullisharXiv – CS AI · 18h ago7/10
🧠SpecDB is an AI system that uses large language models to automatically generate customized relational databases tailored to specific workloads, rather than deploying uniform database systems across all use cases. The generated databases achieve comparable performance to PostgreSQL and MySQL while using only 3% of their code size, demonstrating the viability of AI-driven, purpose-built database synthesis.
AIBearisharXiv – CS AI · 3d ago7/10
🧠Researchers demonstrate that web retrieval in LLM agents significantly degrades safety alignment, with even safety-oriented sources increasing harmful compliance by 25%. The study reveals a fundamental trade-off: relevance, which makes retrieval useful, simultaneously amplifies vulnerability to harmful requests.
AI × CryptoBearisharXiv – CS AI · 3d ago7/10
🤖A research paper argues that language model agents cannot support traditional reputation mechanisms because their mutable architecture—constantly changing models, prompts, and parameters—creates a fundamentally unstable identity that undermines trust signals. The authors propose shifting from identity-based, retroactive governance systems to protocol-based behavioral controls that operate before agents act.
AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers introduce MemCog, a new memory system for conversational AI agents that integrates memory access into the reasoning process rather than treating it as a separate tool. The system uses associative link graphs and proactive reasoning to enable agents to autonomously explore relevant information, achieving state-of-the-art performance on multiple benchmarks including a newly created ProactiveMemBench.
AIBullisharXiv – CS AI · 4d ago7/10
🧠LACUNA is a new programming model that allows LLM agents to write code that shapes their own runtime environment while maintaining safety through type-checking and validation. The system rejects unsafe code before execution and uses compiler diagnostics to drive retries, achieving competitive performance on benchmark tests while preventing prompt injection and tool misuse attacks.
AIBearisharXiv – CS AI · 5d ago7/10
🧠Researchers introduce RepoMirage, an evaluation suite that tests whether code agents truly understand repository context by applying perturbations to challenge their reasoning abilities. The study reveals a significant gap in how agents handle complex, multi-file code tasks, with performance dropping from 66.8% to 25.3% when explicit structural understanding is required.
AINeutralarXiv – CS AI · May 127/10
🧠Researchers introduce MATRA, a threat modeling framework designed to systematically assess security risks in autonomous AI agent systems. The framework combines asset-based impact analysis with attack trees to quantify how LLM vulnerabilities translate into real-world deployment risks, demonstrating its effectiveness on an OpenClaw personal agent case study.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce MIND-Skill, an automated framework that generates reusable skills for LLM-powered AI agents by analyzing successful task trajectories. The system uses dual agents with quality-control mechanisms to create generalizable, documented procedures that enable autonomous systems to handle complex, multi-step problems without manual human expertise.
AINeutralarXiv – CS AI · May 117/10
🧠Researchers propose that AI agents should invoke external tools only when epistemically necessary—when internal reasoning cannot reliably complete a task. The Theory of Agent framework treats tool use as a decision under uncertainty rather than a simple action optimization problem, arguing that unnecessary delegation wastes resources and prevents development of internal reasoning capabilities.
AINeutralarXiv – CS AI · May 117/10
🧠Researchers introduce Self-Programmed Execution (SPE), a novel agent architecture where language models act as their own orchestrators rather than following fixed turn-by-turn policies. The approach uses Spell, a Lisp-based language enabling self-editing programs, and demonstrates that frontier models can perform complex agentic tasks without specialized training.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers introduce BeliefMem, a novel memory architecture for LLM agents that retains multiple candidate conclusions with associated probabilities instead of committing to single deterministic interpretations. This probabilistic approach preserves uncertainty, allows agents to update confidence as new evidence arrives, and demonstrates superior performance on LoCoMo and ALFWorld benchmarks compared to existing memory methods.
AIBullisharXiv – CS AI · May 17/10
🧠Researchers propose a pipeline for dynamically generating persona-based AI agents at runtime, moving beyond fixed agent architectures to enable personalized multi-agent workflows. This approach allows agentic platforms to adapt agent roles, coordination patterns, and interaction flows to match individual user characteristics and contextual demands, opening new design paradigms for more flexible AI systems.
AIBearisharXiv – CS AI · May 17/10
🧠Researchers argue that current AI agent memory systems (vector stores, RAG, scratchpads) perform lookup operations rather than true memory consolidation, causing agents to accumulate indefinite notes without developing expertise, hit a generalization ceiling on novel tasks, and remain vulnerable to persistent memory poisoning attacks. The paper draws on neuroscience's Complementary Learning Systems theory to show biological intelligence pairs fast exemplar storage with slow weight consolidation—a dual mechanism current AI systems lack.
AIBullisharXiv – CS AI · Apr 157/10
🧠Researchers introduce dual-trace memory encoding for LLM agents, pairing factual records with narrative scene reconstructions to improve cross-session recall by 20+ percentage points. The method significantly enhances temporal reasoning and multi-session knowledge aggregation without increasing computational costs, advancing the capability of persistent AI agent systems.
AINeutralarXiv – CS AI · Apr 147/10
🧠Researchers introduce PAC-Bench, a benchmark for evaluating how AI agents collaborate while maintaining privacy constraints. The study reveals that privacy protections significantly degrade multi-agent system performance and identify coordination failures as a critical unsolved challenge requiring new technical approaches.
$PAC
AIBullisharXiv – CS AI · Mar 267/10
🧠Researchers have developed Declarative Model Interface (DMI), a new abstraction layer that transforms traditional GUIs into LLM-friendly interfaces for computer-use agents. Testing with Microsoft Office Suite showed 67% improvement in task success rates and 43.5% reduction in interaction steps, with over 61% of tasks completed in a single LLM call.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers developed a two-agent defense system called OpenClaw that achieved 0% attack success rate against prompt injection attacks on LLM applications. The system uses agent isolation and JSON formatting to structurally prevent malicious prompts from reaching action-taking agents.
AIBullisharXiv – CS AI · Mar 167/10
🧠Researchers introduce the AI Search Paradigm, a comprehensive framework for next-generation search systems using four LLM-powered agents (Master, Planner, Executor, Writer) that collaborate to handle everything from simple queries to complex reasoning tasks. The system employs modular architecture with dynamic workflows for task planning, tool integration, and content synthesis to create more adaptive and scalable AI search capabilities.
AINeutralarXiv – CS AI · Mar 46/103
🧠Researchers prove 'selection theorems' showing that AI agents achieving low regret on prediction tasks must develop internal predictive models and belief states. The work demonstrates that structured internal representations are mathematically necessary, not just helpful, for competent decision-making under uncertainty.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers propose replacing LLM-based triggers in proactive agent systems with a lightweight temporal graph learning (TGL) model that processes structured event streams directly. The approach achieves 16.7% mean F1 improvement while running 4-7x faster on GPUs and 12-83x faster on consumer hardware, with a 220 MiB footprint suitable for on-device deployment.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce FluxMem, a memory framework for AI agents that treats memory as a continuously evolving graph rather than a static repository. The system dynamically refines memory connections through feedback and consolidation across three stages, achieving state-of-the-art results on multiple benchmarks.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce PEAM, a parametric memory framework for AI agents in Minecraft that consolidates learned skills directly into model parameters rather than relying on retrieval-based memory. The system uses a mixture-of-experts architecture with contrastive learning to internalize both successful and failed experiences, achieving better long-horizon task performance while avoiding catastrophic forgetting.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers propose a novel multimodal multi-agent framework that uses graph-based knowledge construction and adaptive retrieval-augmented generation to enable autonomous agents to execute complex workflows more effectively. The system combines offline discovery of workflow topology from execution logs with real-time collaborative verification, demonstrating improved performance in novel scenarios with limited training data.
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers propose Governed Evolving Memory (GEM), a new paradigm for long-term AI agent memory that treats memory as a state-management workload rather than traditional database storage. The framework addresses four critical failure modes in current agent systems—unregulated growth, missing semantic revision, capacity-driven forgetting, and read-only retrieval—through four state-level operators and six correctness conditions that operate at the trajectory level rather than individual records.
AINeutralarXiv – CS AI · May 126/10
🧠SkillLens introduces a hierarchical framework for organizing and reusing skills in LLM agents at multiple granularity levels, reducing computational costs while maintaining relevance. The system retrieves and adapts skills selectively rather than injecting entire skill blocks, achieving measurable performance gains on benchmark tasks.