AIBearisharXiv – CS AI · 6d ago7/10
🧠Researchers have identified a critical security vulnerability in agentic AI systems called cross-session stored prompt injection, where malicious instructions can persist within system state and compromise future interactions long after the attacker disconnects. This threat fundamentally differs from traditional prompt injection by leveraging long-lived system artifacts like memories and filesystems, transforming ephemeral model-level attacks into durable system-level vulnerabilities that accumulate over time.
AINeutralarXiv – CS AI · Jun 27/10
🧠Researchers present a monitoring methodology for agentic AI systems still in early production stages, where structural integration defects rather than task-level errors cause most failures. The approach uses variance-based characterization across three monitoring scopes to identify and triage issues, finding that task-level error detection is often masked by underlying system architecture problems.
AINeutralarXiv – CS AI · May 127/10
🧠Researchers introduced AgentCollabBench, a diagnostic benchmark revealing critical vulnerabilities in multi-agent AI systems where constraints silently fail during peer collaboration. The study demonstrates that communication topology—not model capability alone—determines whether safeguards survive information handoffs between agents, exposing structural weaknesses invisible to standard outcome-based evaluation.
🧠 GPT-4🧠 Gemini🧠 Llama
AIBullisharXiv – CS AI · Apr 107/10
🧠Researchers propose AI-Driven Research for Systems (ADRS), a framework using large language models to automate database optimization by generating and evaluating hundreds of candidate solutions. By co-evolving evaluators with solutions, the team demonstrates discovery of novel algorithms achieving up to 6.8x latency improvements over existing baselines in buffer management, query rewriting, and index selection tasks.
AINeutralarXiv – CS AI · 2d ago6/10
🧠This technical guide presents twelve practical recommendations for designing AI-driven high-performance computing (HPC) workflows that balance the iterative, probabilistic nature of modern AI with traditional HPC infrastructure. The article addresses critical system-level challenges including containerization, resource management, and I/O optimization, providing researchers with a framework to transition from rigid computational pipelines to adaptive, intelligent environments.
AINeutralarXiv – CS AI · Jun 16/10
🧠Researchers introduce CoSee, an auditing framework for analyzing failure modes in collaborative visual reasoning systems using resource-constrained language models (4B-8B parameters). The study reveals that shared working memory architectures paradoxically amplify hallucinations rather than improve performance, identifying two critical failure modes: noise reinforcement and policy collapse.
AINeutralarXiv – CS AI · May 276/10
🧠Researchers introduce MemFail, a diagnostic benchmark for testing failure modes in LLM memory systems by isolating three core operations: summarization, storage, and retrieval. The benchmark evaluates state-of-the-art memory systems across five adversarially-designed datasets to empirically understand architectural tradeoffs, moving beyond aggregate accuracy metrics.
AINeutralarXiv – CS AI · Apr 146/10
🧠VeriTrans is a machine learning system that converts natural language requirements into formal logic suitable for automated solvers, using a validator-gated pipeline to ensure reliability. Achieving 94.46% correctness on 2,100 specifications, the system combines fine-tuned language models with round-trip verification and deterministic execution, enabling auditable translation for critical applications.
$PL$NL$CNF
AIBullisharXiv – CS AI · Apr 146/10
🧠Researchers introduce AEG, a bare-metal runtime framework that enables high-performance machine learning inference on heterogeneous AI accelerators without OS overhead. The system achieves 9.2× higher compute efficiency and uses 11× fewer hardware tiles than Linux-based alternatives, demonstrating significant potential for edge AI deployment optimization.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduced a multi-agent AI framework for whole-system software optimization that goes beyond local code improvements to analyze entire microservice architectures. The system uses coordinated agents for summarization, analysis, optimization, and verification, achieving 36.58% throughput improvement and 27.81% response time reduction in proof-of-concept testing.
AIBullisharXiv – CS AI · Mar 37/107
🧠Researchers propose QuickGrasp, a video-language querying system that combines local processing with edge computing to achieve both fast response times and high accuracy. The system achieves up to 12.8x reduction in response delay while maintaining the accuracy of large video-language models through accelerated tokenization and adaptive edge augmentation.