DeFiBullishBankless · 2d ago7/10
💎TamaSwap launches as the first decentralized exchange built with Verity, a smart contract language engineered for formal verification and provable security. This development represents a significant step toward eliminating smart contract vulnerabilities that have historically plagued DeFi platforms.
AIBullisharXiv – CS AI · 2d ago7/10
🧠Researchers propose Proof-Constrained Action (ePCA), a formal verification framework that requires AI agents to express intentions as mathematical constraints before executing actions, eliminating reliance on semantic guardrails. The approach achieves zero attack success rates in testing and addresses critical security gaps as LLMs evolve from text generators into autonomous agents with real-world execution capabilities.
AIBullisharXiv – CS AI · 2d ago7/10
🧠Researchers introduce proof-state snapshotting, a technique that accelerates automated theorem proving in Lean 4 by reusing elaborated proof states across parallel search branches instead of reconstructing them. The method achieves 5.6-50x speedups (averaging 14x) on benchmark problems, addressing a critical bottleneck where per-branch overhead from import loading and elaboration consumed over 99% of computation time.
AIBullisharXiv – CS AI · 2d ago7/10
🧠Researchers have developed AutoformBot, a multi-agent AI system that automatically translates informal mathematics textbooks into machine-verified formal proofs in Lean 4. The team successfully formalized 26 open-access textbooks into a library called Atlas containing over 45,000 declarations and 500,000 lines of verified code, demonstrating that large-scale automated mathematics formalization is now economically viable.
AINeutralarXiv – CS AI · 3d ago7/10
🧠Researchers propose the SMARt framework, a four-layer autonomous AI system architecture that manages failures through formal escalation protocols rather than relying solely on model improvements. The framework enables AI agents to detect uncertainty, suspend operations, attempt recovery, and surrender control when reliability diminishes, addressing the fundamental architectural vulnerability of unbounded autonomy in deployed agentic systems.
AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers present alpha-beta-CROWN, a neural network verification framework that enables formal verification of learning-based controllers in safety-critical systems. The tool addresses scalability challenges in verifying controller properties like stability and safety by computing certified bounds on nonlinear functions and using GPU parallelization for complex verification tasks.
AI × CryptoBullishBankless · May 187/10
🤖Vitalik Buterin advocates for AI-powered formal verification as a security advancement for cryptocurrency systems. The Ethereum co-founder believes integrating AI-assisted verification tools can strengthen cryptographic security and reduce vulnerabilities in blockchain infrastructure.
$ETH
AIBullisharXiv – CS AI · May 127/10
🧠Shepherd is a new runtime substrate that enables meta-agents to supervise and optimize other agents through formalized execution traces, achieving 5x faster forking than Docker and demonstrating measurable improvements in coding assistance, optimization, and reinforcement learning tasks. The open-source system mechanizes core operations in Lean and enables replay, branching, and counterfactual exploration of agent behaviors.
AINeutralarXiv – CS AI · May 127/10
🧠Researchers introduce containment verification, a formal verification approach that embeds safety guarantees directly into agentic AI frameworks rather than relying on model alignment. The team demonstrated the paradigm by verifying PocketFlow, an LLM framework, using Dafny formal methods—marking the first deductive verification of an agentic framework with safety properties independent of model capabilities.
AINeutralarXiv – CS AI · May 117/10
🧠Researchers prove a fundamental mathematical incompatibility between accuracy, trust, and human-level reasoning in AI systems, demonstrating that systems designed to never make false claims cannot solve certain problems that humans can easily solve. The findings parallel Gödel's incompleteness theorems and establish formal limitations on what AI systems can achieve regardless of computational power.
AIBullisharXiv – CS AI · May 77/10
🧠Researchers introduce LAWS, a self-certifying caching architecture for neural inference that builds a library of expert functions with formal error bounds, enabling efficient deployment across LLMs, robotics, and edge devices. The system generalizes both Mixture-of-Experts and KV prefix caching while providing mathematically verifiable performance guarantees without requiring ground truth validation.
AIBullisharXiv – CS AI · May 17/10
🧠Researchers introduce ANCORA, a self-play framework enabling language models to generate verifiable problems, solve them, and improve without human supervision. The method achieves 81.5% pass rate on Dafny2Verus tasks, significantly outperforming baseline approaches and demonstrating advances in autonomous AI reasoning capabilities.
AIBearisharXiv – CS AI · May 17/10
🧠Researchers present a formal framework proving that AI governance systems structurally fail when expressiveness boundaries (what AI can do) and governance boundaries (what's regulated) are defined independently, creating inevitable gaps. The paper proposes 'coterminous governance'—aligning these boundaries through architectural separation of computation from effects—as the only viable solution, with proofs mechanized in Coq.
AIBullisharXiv – CS AI · Apr 137/10
🧠Researchers introduce SafeAdapt, a novel framework for updating reinforcement learning policies while maintaining provable safety guarantees across changing environments. The approach uses a 'Rashomon set' to identify safe parameter regions and projects policy updates onto this certified space, addressing the critical challenge of deploying RL agents in safety-critical applications where dynamics and objectives evolve over time.
AINeutralarXiv – CS AI · Apr 107/10
🧠Researchers prove mathematically that no continuous input-preprocessing defense can simultaneously maintain utility, preserve model functionality, and guarantee safety against prompt injection attacks in language models with connected prompt spaces. The findings establish a fundamental trilemma showing that defenses must inevitably fail at some threshold inputs, with results verified in Lean 4 and validated empirically across three LLMs.
AIBullisharXiv – CS AI · Apr 107/10
🧠ClawLess introduces a formally verified security framework that enforces policies on AI agents operating with code execution and information retrieval capabilities, addressing risks that existing training-based approaches cannot adequately mitigate. The system uses BPF-based syscall interception and a user-space kernel to prevent adversarial AI agents from violating security boundaries, regardless of their internal design.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers have developed LeanTutor, a proof-of-concept AI system that combines Large Language Models with theorem provers to create a mathematically verified proof tutor. The system features three modules for autoformalization, proof-checking, and natural language feedback, evaluated using PeanoBench, a new dataset of 371 Peano Arithmetic proofs.
AIBullisharXiv – CS AI · Mar 47/104
🧠VeriStruct is a new AI framework that automates formal verification of complex data structure modules in the Verus programming language. The system achieved a 99.2% success rate in verifying 128 out of 129 functions across eleven Rust data structure modules, representing significant progress in AI-assisted formal verification.
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers introduce IoUCert, a new formal verification framework that enables robustness verification for anchor-based object detection models like SSD, YOLOv2, and YOLOv3. The breakthrough uses novel coordinate transformations and Interval Bound Propagation to overcome previous limitations in verifying object detection systems against input perturbations.
AIBullisharXiv – CS AI · Mar 46/104
🧠Researchers have developed a framework that allows neural network verification tools to accept natural language specifications instead of low-level technical constraints. The system automatically translates human-readable requirements into formal verification queries, significantly expanding the practical applicability of neural network verification across diverse domains.
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers have developed an agentic AI-driven workflow using Large Language Models to automate coverage analysis for formal verification in integrated chip development. The approach systematically identifies coverage gaps and generates required formal properties, demonstrating measurable improvements in coverage metrics that correlate with design complexity.
AIBullisharXiv – CS AI · Mar 47/102
🧠Researchers have enhanced the Saarthi AI framework for formal verification, achieving 70% better accuracy in generating SystemVerilog assertions and 50% fewer iterations to reach coverage closure. The framework uses multi-agent collaboration and improved RAG techniques to move toward domain-specific AI intelligence for verification tasks.
AINeutralarXiv – CS AI · Feb 277/107
🧠Researchers introduced LeanCat, a benchmark comprising 100 category-theory tasks in Lean to test AI's formal theorem proving capabilities. State-of-the-art models achieved only 12% success rates, revealing significant limitations in abstract mathematical reasoning, while a new retrieval-augmented approach doubled performance to 24%.
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers introduce Agent Behavioral Contracts (ABC), a formal framework for specifying and enforcing reliable behavior in autonomous AI agents. The system addresses critical issues of drift and governance failures in AI deployments by implementing runtime-enforceable contracts that achieve 88-100% compliance rates and significantly improve violation detection.
AIBullishOpenAI News · Feb 27/105
🧠Researchers have developed a neural theorem prover for Lean that successfully solved challenging high-school mathematics olympiad problems, including those from AMC12, AIME competitions, and two problems adapted from the International Mathematical Olympiad (IMO). This represents a significant advancement in AI's ability to handle formal mathematical reasoning and proof generation.