7 articles tagged with #formal-methods. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Apr 107/10
๐ง Researchers propose a shift from deterministic to probabilistic safety verification for embodied AI systems, arguing that provable probabilistic guarantees offer a more practical path to large-scale deployment in safety-critical applications like autonomous vehicles and robotics than the infeasible goal of absolute safety across all scenarios.
AIBearisharXiv โ CS AI ยท Apr 77/10
๐ง Researchers prove a fundamental theoretical limit in AI safety verification using Kolmogorov complexity theory. They demonstrate that no finite formal verifier can certify all policy-compliant AI instances of arbitrarily high complexity, revealing intrinsic information-theoretic barriers beyond computational constraints.
AIBullisharXiv โ CS AI ยท Apr 67/10
๐ง SentinelAgent introduces a formal framework for securing multi-agent AI systems through verifiable delegation chains, achieving 100% accuracy in testing with zero false positives. The system uses seven verification properties and a non-LLM authority service to ensure secure delegation between AI agents in federal environments.
AINeutralarXiv โ CS AI ยท Apr 206/10
๐ง Researchers introduce DPrivBench, a benchmark for evaluating how well large language models can reason about differential privacy algorithms and verify their correctness. Testing shows current LLMs handle basic DP mechanisms competently but fail significantly on advanced algorithms, exposing critical gaps in automated privacy reasoning capabilities.
AINeutralarXiv โ CS AI ยท Apr 156/10
๐ง Researchers introduce Text2Model and Text2Zinc, frameworks that use large language models to translate natural language descriptions into formal optimization and satisfaction models. The work represents the first unified approach combining both problem types with a solver-agnostic architecture, though experiments reveal LLMs remain imperfect at this task despite showing competitive performance.
AI ร CryptoBullisharXiv โ CS AI ยท Apr 136/10
๐คSPEAR is a multi-agent AI framework designed to automate smart contract auditing through coordinated specialist agents that prioritize contracts, allocate tasks, and recover from failures autonomously. The research demonstrates how established multi-agent system patterns can improve security analysis workflows beyond centralized or pipeline-based approaches.
AIBullisharXiv โ CS AI ยท Mar 126/10
๐ง Researchers introduce FAME (Formal Abstract Minimal Explanations), a new method for explaining neural network decisions that scales to large networks while producing smaller explanations. The approach uses abstract interpretation and dedicated perturbation domains to eliminate irrelevant features and converge to minimal explanations more efficiently than existing methods.