y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#formal-verification News & Analysis

34 articles tagged with #formal-verification. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

34 articles
AIBullisharXiv – CS AI · 3d ago7/10
🧠

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

Researchers introduce SafeAdapt, a novel framework for updating reinforcement learning policies while maintaining provable safety guarantees across changing environments. The approach uses a 'Rashomon set' to identify safe parameter regions and projects policy updates onto this certified space, addressing the critical challenge of deploying RL agents in safety-critical applications where dynamics and objectives evolve over time.

AIBullisharXiv – CS AI · 6d ago7/10
🧠

ClawLess: A Security Model of AI Agents

ClawLess introduces a formally verified security framework that enforces policies on AI agents operating with code execution and information retrieval capabilities, addressing risks that existing training-based approaches cannot adequately mitigate. The system uses BPF-based syscall interception and a user-space kernel to prevent adversarial AI agents from violating security boundaries, regardless of their internal design.

AINeutralarXiv – CS AI · 6d ago7/10
🧠

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Researchers prove mathematically that no continuous input-preprocessing defense can simultaneously maintain utility, preserve model functionality, and guarantee safety against prompt injection attacks in language models with connected prompt spaces. The findings establish a fundamental trilemma showing that defenses must inevitably fail at some threshold inputs, with results verified in Lean 4 and validated empirically across three LLMs.

AIBullisharXiv – CS AI · Mar 57/10
🧠

LeanTutor: Towards a Verified AI Mathematical Proof Tutor

Researchers have developed LeanTutor, a proof-of-concept AI system that combines Large Language Models with theorem provers to create a mathematically verified proof tutor. The system features three modules for autoformalization, proof-checking, and natural language feedback, evaluated using PeanoBench, a new dataset of 371 Peano Arithmetic proofs.

AIBullisharXiv – CS AI · Mar 46/103
🧠

Agentic AI-based Coverage Closure for Formal Verification

Researchers have developed an agentic AI-driven workflow using Large Language Models to automate coverage analysis for formal verification in integrated chip development. The approach systematically identifies coverage gaps and generates required formal properties, demonstrating measurable improvements in coverage metrics that correlate with design complexity.

AIBullisharXiv – CS AI · Mar 47/102
🧠

Saarthi for AGI: Towards Domain-Specific General Intelligence for Formal Verification

Researchers have enhanced the Saarthi AI framework for formal verification, achieving 70% better accuracy in generating SystemVerilog assertions and 50% fewer iterations to reach coverage closure. The framework uses multi-agent collaboration and improved RAG techniques to move toward domain-specific AI intelligence for verification tasks.

AIBullisharXiv – CS AI · Mar 46/104
🧠

Talking with Verifiers: Automatic Specification Generation for Neural Network Verification

Researchers have developed a framework that allows neural network verification tools to accept natural language specifications instead of low-level technical constraints. The system automatically translates human-readable requirements into formal verification queries, significantly expanding the practical applicability of neural network verification across diverse domains.

AIBullisharXiv – CS AI · Mar 46/103
🧠

IoUCert: Robustness Verification for Anchor-based Object Detectors

Researchers introduce IoUCert, a new formal verification framework that enables robustness verification for anchor-based object detection models like SSD, YOLOv2, and YOLOv3. The breakthrough uses novel coordinate transformations and Interval Bound Propagation to overcome previous limitations in verifying object detection systems against input perturbations.

AIBullisharXiv – CS AI · Mar 47/104
🧠

VeriStruct: AI-assisted Automated Verification of Data-Structure Modules in Verus

VeriStruct is a new AI framework that automates formal verification of complex data structure modules in the Verus programming language. The system achieved a 99.2% success rate in verifying 128 out of 129 functions across eleven Rust data structure modules, representing significant progress in AI-assisted formal verification.

AIBullisharXiv – CS AI · Feb 277/105
🧠

Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents

Researchers introduce Agent Behavioral Contracts (ABC), a formal framework for specifying and enforcing reliable behavior in autonomous AI agents. The system addresses critical issues of drift and governance failures in AI deployments by implementing runtime-enforceable contracts that achieve 88-100% compliance rates and significantly improve violation detection.

AINeutralarXiv – CS AI · Feb 277/107
🧠

LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)

Researchers introduced LeanCat, a benchmark comprising 100 category-theory tasks in Lean to test AI's formal theorem proving capabilities. State-of-the-art models achieved only 12% success rates, revealing significant limitations in abstract mathematical reasoning, while a new retrieval-augmented approach doubled performance to 24%.

AIBullishOpenAI News · Feb 27/105
🧠

Solving (some) formal math olympiad problems

Researchers have developed a neural theorem prover for Lean that successfully solved challenging high-school mathematics olympiad problems, including those from AMC12, AIME competitions, and two problems adapted from the International Mathematical Olympiad (IMO). This represents a significant advancement in AI's ability to handle formal mathematical reasoning and proof generation.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline

VeriTrans is a machine learning system that converts natural language requirements into formal logic suitable for automated solvers, using a validator-gated pipeline to ensure reliability. Achieving 94.46% correctness on 2,100 specifications, the system combines fine-tuned language models with round-trip verification and deterministic execution, enabling auditable translation for critical applications.

$PL$NL$CNF
AINeutralarXiv – CS AI · 2d ago6/10
🧠

From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution

Researchers propose SGH (Structured Graph Harness), a framework that replaces iterative Agent Loops with explicit directed acyclic graphs (DAGs) for LLM agent execution. The approach addresses structural weaknesses in current agent design by enforcing immutable execution plans, separating planning from recovery, and implementing strict escalation protocols, trading some flexibility for improved controllability and verifiability.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

From Helpful to Trustworthy: LLM Agents for Pair Programming

Doctoral research proposes a systematic framework for multi-agent LLM pair programming that improves code reliability and auditability through externalized intent and iterative validation. The study addresses critical gaps in how AI coding agents can produce trustworthy outputs aligned with developer objectives across testing, implementation, and maintenance workflows.

AINeutralarXiv – CS AI · 6d ago6/10
🧠

ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning

Researchers present ProofSketcher, a hybrid system combining large language models with lightweight proof verification to address mathematical reasoning errors in AI-generated proofs. The approach bridges the gap between LLM efficiency and the formal rigor of interactive theorem provers like Lean and Coq, enabling more reliable automated reasoning without requiring full formalization.

$AVAX
AIBullisharXiv – CS AI · Mar 276/10
🧠

Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

Researchers have developed the first formal mathematical framework for verifying AI agent protocols, specifically comparing Schema-Guided Dialogue (SGD) and Model Context Protocol (MCP). They proved these systems are structurally similar but identified critical gaps in MCP's capabilities, proposing MCP+ extensions to achieve full equivalence with SGD.

AINeutralarXiv – CS AI · Mar 166/10
🧠

Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection

Researchers introduce Budget-Sensitive Discovery Score (BSDS), a formally verified framework for evaluating AI-guided scientific candidate selection under budget constraints. Testing on drug discovery datasets reveals that simple random forest models outperform large language models, with LLMs providing no marginal value over existing trained classifiers.

CryptoBullishCryptoSlate · Mar 106/10
⛓️

Cardano spent years looking slow. Now that may help it win in crypto’s rule-heavy era

Cardano is positioning itself as a regulatory-compliant blockchain through recent governance and formal verification updates, potentially gaining advantages as Europe's MiCA regulations push the crypto industry toward greater accountability. The platform's historically slow but methodical approach to development may now be an asset in an increasingly rule-heavy regulatory environment.

Cardano spent years looking slow. Now that may help it win in crypto’s rule-heavy era
$ADA
AINeutralarXiv – CS AI · Mar 66/10
🧠

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Researchers introduce X-RAY, a new system for analyzing large language model reasoning capabilities through formally verified probes that isolate structural components of reasoning. The study reveals LLMs handle constraint refinement well but struggle with solution-space restructuring, providing contamination-free evaluation methods.

DeFiBullishThe Block · Mar 56/10
💎

Aave Labs outlines layered security plan for V4 after $1.5 million audit program

Aave Labs has announced a comprehensive security framework for its upcoming V4 protocol, featuring formal verification, layered security reviews, and a bug bounty program. This follows a substantial $1.5 million audit program, demonstrating the protocol's commitment to security before launch.

Aave Labs outlines layered security plan for V4 after $1.5 million audit program
$AAVE
AINeutralarXiv – CS AI · Mar 37/106
🧠

Formal Analysis and Supply Chain Security for Agentic AI Skills

Researchers developed SkillFortify, the first formal analysis framework for securing AI agent skill supply chains, addressing critical vulnerabilities exposed by attacks like ClawHavoc that infiltrated over 1,200 malicious skills. The framework achieved 96.95% F1 score with 100% precision and zero false positives in detecting malicious AI agent skills.

AIBullisharXiv – CS AI · Mar 37/107
🧠

ATLAS: AI-Assisted Threat-to-Assertion Learning for System-on-Chip Security Verification

ATLAS is a new AI-driven framework that uses large language models to automate System-on-Chip (SoC) security verification by converting threat models into formal verification properties. The system successfully detected 39 out of 48 security weaknesses in benchmark tests and generated correct security properties for 33 of those vulnerabilities.

AIBullishIEEE Spectrum – AI · Mar 27/107
🧠

Watershed Moment for AI–human Collaboration in Math

Ukrainian mathematician Maryna Viazovska's Fields Medal-winning sphere packing proofs have been formally verified through AI-human collaboration using Math, Inc.'s Gauss AI system and the Lean proof assistant. This represents a significant breakthrough in AI's ability to assist with complex mathematical research and formal proof verification.

Watershed Moment for AI–human Collaboration in Math
$TAO
AIBullisharXiv – CS AI · Mar 27/1011
🧠

Foundation World Models for Agents that Learn, Verify, and Adapt Reliably Beyond Static Environments

Researchers propose a new framework for foundation world models that enables autonomous agents to learn, verify, and adapt reliably in dynamic environments. The approach combines reinforcement learning with formal verification and adaptive abstraction to create agents that can synthesize verifiable programs and maintain correctness while adapting to novel conditions.

Page 1 of 2Next →