34 articles tagged with #formal-verification. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers introduce SafeAdapt, a novel framework for updating reinforcement learning policies while maintaining provable safety guarantees across changing environments. The approach uses a 'Rashomon set' to identify safe parameter regions and projects policy updates onto this certified space, addressing the critical challenge of deploying RL agents in safety-critical applications where dynamics and objectives evolve over time.
AIBullisharXiv – CS AI · 6d ago7/10
🧠ClawLess introduces a formally verified security framework that enforces policies on AI agents operating with code execution and information retrieval capabilities, addressing risks that existing training-based approaches cannot adequately mitigate. The system uses BPF-based syscall interception and a user-space kernel to prevent adversarial AI agents from violating security boundaries, regardless of their internal design.
AINeutralarXiv – CS AI · 6d ago7/10
🧠Researchers prove mathematically that no continuous input-preprocessing defense can simultaneously maintain utility, preserve model functionality, and guarantee safety against prompt injection attacks in language models with connected prompt spaces. The findings establish a fundamental trilemma showing that defenses must inevitably fail at some threshold inputs, with results verified in Lean 4 and validated empirically across three LLMs.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers have developed LeanTutor, a proof-of-concept AI system that combines Large Language Models with theorem provers to create a mathematically verified proof tutor. The system features three modules for autoformalization, proof-checking, and natural language feedback, evaluated using PeanoBench, a new dataset of 371 Peano Arithmetic proofs.
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers have developed an agentic AI-driven workflow using Large Language Models to automate coverage analysis for formal verification in integrated chip development. The approach systematically identifies coverage gaps and generates required formal properties, demonstrating measurable improvements in coverage metrics that correlate with design complexity.
AIBullisharXiv – CS AI · Mar 47/102
🧠Researchers have enhanced the Saarthi AI framework for formal verification, achieving 70% better accuracy in generating SystemVerilog assertions and 50% fewer iterations to reach coverage closure. The framework uses multi-agent collaboration and improved RAG techniques to move toward domain-specific AI intelligence for verification tasks.
AIBullisharXiv – CS AI · Mar 46/104
🧠Researchers have developed a framework that allows neural network verification tools to accept natural language specifications instead of low-level technical constraints. The system automatically translates human-readable requirements into formal verification queries, significantly expanding the practical applicability of neural network verification across diverse domains.
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers introduce IoUCert, a new formal verification framework that enables robustness verification for anchor-based object detection models like SSD, YOLOv2, and YOLOv3. The breakthrough uses novel coordinate transformations and Interval Bound Propagation to overcome previous limitations in verifying object detection systems against input perturbations.
AIBullisharXiv – CS AI · Mar 47/104
🧠VeriStruct is a new AI framework that automates formal verification of complex data structure modules in the Verus programming language. The system achieved a 99.2% success rate in verifying 128 out of 129 functions across eleven Rust data structure modules, representing significant progress in AI-assisted formal verification.
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers introduce Agent Behavioral Contracts (ABC), a formal framework for specifying and enforcing reliable behavior in autonomous AI agents. The system addresses critical issues of drift and governance failures in AI deployments by implementing runtime-enforceable contracts that achieve 88-100% compliance rates and significantly improve violation detection.
AINeutralarXiv – CS AI · Feb 277/107
🧠Researchers introduced LeanCat, a benchmark comprising 100 category-theory tasks in Lean to test AI's formal theorem proving capabilities. State-of-the-art models achieved only 12% success rates, revealing significant limitations in abstract mathematical reasoning, while a new retrieval-augmented approach doubled performance to 24%.
AIBullishOpenAI News · Feb 27/105
🧠Researchers have developed a neural theorem prover for Lean that successfully solved challenging high-school mathematics olympiad problems, including those from AMC12, AIME competitions, and two problems adapted from the International Mathematical Olympiad (IMO). This represents a significant advancement in AI's ability to handle formal mathematical reasoning and proof generation.
AINeutralarXiv – CS AI · 2d ago6/10
🧠VeriTrans is a machine learning system that converts natural language requirements into formal logic suitable for automated solvers, using a validator-gated pipeline to ensure reliability. Achieving 94.46% correctness on 2,100 specifications, the system combines fine-tuned language models with round-trip verification and deterministic execution, enabling auditable translation for critical applications.
$PL$NL$CNF
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers propose SGH (Structured Graph Harness), a framework that replaces iterative Agent Loops with explicit directed acyclic graphs (DAGs) for LLM agent execution. The approach addresses structural weaknesses in current agent design by enforcing immutable execution plans, separating planning from recovery, and implementing strict escalation protocols, trading some flexibility for improved controllability and verifiability.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Doctoral research proposes a systematic framework for multi-agent LLM pair programming that improves code reliability and auditability through externalized intent and iterative validation. The study addresses critical gaps in how AI coding agents can produce trustworthy outputs aligned with developer objectives across testing, implementation, and maintenance workflows.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers present ProofSketcher, a hybrid system combining large language models with lightweight proof verification to address mathematical reasoning errors in AI-generated proofs. The approach bridges the gap between LLM efficiency and the formal rigor of interactive theorem provers like Lean and Coq, enabling more reliable automated reasoning without requiring full formalization.
$AVAX
AIBullisharXiv – CS AI · Mar 276/10
🧠Researchers have developed the first formal mathematical framework for verifying AI agent protocols, specifically comparing Schema-Guided Dialogue (SGD) and Model Context Protocol (MCP). They proved these systems are structurally similar but identified critical gaps in MCP's capabilities, proposing MCP+ extensions to achieve full equivalence with SGD.
AINeutralarXiv – CS AI · Mar 166/10
🧠Researchers introduce Budget-Sensitive Discovery Score (BSDS), a formally verified framework for evaluating AI-guided scientific candidate selection under budget constraints. Testing on drug discovery datasets reveals that simple random forest models outperform large language models, with LLMs providing no marginal value over existing trained classifiers.
CryptoBullishCryptoSlate · Mar 106/10
⛓️Cardano is positioning itself as a regulatory-compliant blockchain through recent governance and formal verification updates, potentially gaining advantages as Europe's MiCA regulations push the crypto industry toward greater accountability. The platform's historically slow but methodical approach to development may now be an asset in an increasingly rule-heavy regulatory environment.
$ADA
AINeutralarXiv – CS AI · Mar 66/10
🧠Researchers introduce X-RAY, a new system for analyzing large language model reasoning capabilities through formally verified probes that isolate structural components of reasoning. The study reveals LLMs handle constraint refinement well but struggle with solution-space restructuring, providing contamination-free evaluation methods.
DeFiBullishThe Block · Mar 56/10
💎Aave Labs has announced a comprehensive security framework for its upcoming V4 protocol, featuring formal verification, layered security reviews, and a bug bounty program. This follows a substantial $1.5 million audit program, demonstrating the protocol's commitment to security before launch.
$AAVE
AINeutralarXiv – CS AI · Mar 37/106
🧠Researchers developed SkillFortify, the first formal analysis framework for securing AI agent skill supply chains, addressing critical vulnerabilities exposed by attacks like ClawHavoc that infiltrated over 1,200 malicious skills. The framework achieved 96.95% F1 score with 100% precision and zero false positives in detecting malicious AI agent skills.
AIBullisharXiv – CS AI · Mar 37/107
🧠ATLAS is a new AI-driven framework that uses large language models to automate System-on-Chip (SoC) security verification by converting threat models into formal verification properties. The system successfully detected 39 out of 48 security weaknesses in benchmark tests and generated correct security properties for 33 of those vulnerabilities.
AIBullishIEEE Spectrum – AI · Mar 27/107
🧠Ukrainian mathematician Maryna Viazovska's Fields Medal-winning sphere packing proofs have been formally verified through AI-human collaboration using Math, Inc.'s Gauss AI system and the Lean proof assistant. This represents a significant breakthrough in AI's ability to assist with complex mathematical research and formal proof verification.
$TAO
AIBullisharXiv – CS AI · Mar 27/1011
🧠Researchers propose a new framework for foundation world models that enables autonomous agents to learn, verify, and adapt reliably in dynamic environments. The approach combines reinforcement learning with formal verification and adaptive abstraction to create agents that can synthesize verifiable programs and maintain correctness while adapting to novel conditions.