y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#formal-methods News & Analysis

25 articles tagged with #formal-methods. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

25 articles
AIBullisharXiv – CS AI · Jun 57/10
🧠

VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents

Researchers introduce VASO, a framework that combines formal verification with self-evolving language model skills for robot control, achieving 97.2% specification compliance on physical tasks. The approach bridges formal methods and foundation models by using counterexamples from model checking as optimization feedback for skill contracts rather than modifying underlying model weights.

AIBullisharXiv – CS AI · Jun 17/10
🧠

Learning to Solve and Optimize by Evolving Code

Researchers introduce CHECKMATE, a tool that automatically generates optimization algorithms through code evolution, requiring only formal problem specifications and natural language descriptions rather than expert-designed heuristics. The evolved algorithms outperform state-of-the-art solvers on industrial configuration and scheduling problems, demonstrating formal methods can guide automated algorithm discovery for complex real-world optimization challenges.

AIBullisharXiv – CS AI · May 277/10
🧠

Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)

Researchers present a hybrid neuro-symbolic architecture that combines formal logic with neural semantic analysis to verify LLM outputs in high-stakes domains like healthcare. The system achieves over 83% hallucination detection rates for structured data and 72% for semantic fabrications while reducing report creation time by 30%, demonstrating practical safeguards for deploying LLMs in data-sensitive applications.

AIBullisharXiv – CS AI · Apr 107/10
🧠

Towards provable probabilistic safety for scalable embodied AI systems

Researchers propose a shift from deterministic to probabilistic safety verification for embodied AI systems, arguing that provable probabilistic guarantees offer a more practical path to large-scale deployment in safety-critical applications like autonomous vehicles and robotics than the infeasible goal of absolute safety across all scenarios.

AIBearisharXiv – CS AI · Apr 77/10
🧠

Incompleteness of AI Safety Verification via Kolmogorov Complexity

Researchers prove a fundamental theoretical limit in AI safety verification using Kolmogorov complexity theory. They demonstrate that no finite formal verifier can certify all policy-compliant AI instances of arbitrarily high complexity, revealing intrinsic information-theoretic barriers beyond computational constraints.

AIBullisharXiv – CS AI · Apr 67/10
🧠

SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems

SentinelAgent introduces a formal framework for securing multi-agent AI systems through verifiable delegation chains, achieving 100% accuracy in testing with zero false positives. The system uses seven verification properties and a non-LLM authority service to ensure secure delegation between AI agents in federal environments.

AINeutralarXiv – CS AI · 6d ago5/10
🧠

The Theory of Mind Utility: Formal Specification of a Mentalizing Mechanism

Researchers introduce Theory of Mind Utility (ToM-U), a formal computational framework for modeling how agents infer others' beliefs by tracking information access and credibility. The model uses directed graphs called Local Epistemic World Models to represent epistemic relationships and generates falsifiable predictions about mentalizing failures, advancing cognitive science theory beyond existing Bayesian and simulation-based approaches.

AINeutralarXiv – CS AI · Jun 116/10
🧠

Runtime Enforcement of Hybrid System Properties

Researchers propose a runtime enforcement framework using Hybrid Automata to actively prevent safety violations in autonomous and cyber-physical systems by monitoring and modifying unsafe behaviors in real time. The approach combines discrete-event editing with continuous monitoring and is validated through an Adaptive Cruise Control case study, demonstrating effective safety compliance with minimal computational overhead.

AINeutralarXiv – CS AI · Jun 96/10
🧠

Hybrid Robustness Verification for Spatio-Temporal Neural Networks

Researchers introduce Spatio-Temporal Bound Propagation (STBP), a verification framework for neural networks processing video and volumetric data that provides formal robustness guarantees under realistic adversarial constraints. The method achieves 1.7x higher certified robust accuracy compared to existing approaches while maintaining computational scalability, addressing a critical gap in AI safety for applications like autonomous driving and medical imaging.

AINeutralarXiv – CS AI · Jun 56/10
🧠

TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory

TOKI is a formal framework that types contradiction resolution in LLM-agent persistent memory systems as a write-time concurrency control problem. The research proves that four common heuristics used in production systems admit unspecified isolation levels and anomalies, and proposes a bitemporal operator algebra with audit-row provenance that excludes three critical write-time anomalies while maintaining language-model oversight.

AINeutralarXiv – CS AI · Jun 26/10
🧠

Robust Shielding for Safe Reinforcement Learning

Researchers introduce a novel shielding framework for reinforcement learning agents that guarantees safety without requiring prior knowledge of system dynamics. By combining robust MDPs with linear temporal logic specifications and PAC learning guarantees, the approach enables the creation of minimally restrictive safety shields for unknown environments while maintaining strong performance as data accumulates.

AINeutralarXiv – CS AI · Jun 25/10
🧠

SEMBridge: Tagless-Final Program Semantics with Weakest-Precondition and Bounded-Checking Interpretations

SEMBridge is a tagless-final framework that enables developers to write program semantics once and automatically generate multiple interpretations, including executable code, weakest-precondition verification conditions, and bounded-checking validators. The Python prototype demonstrates synchronization of formal verification artifacts with executable semantics across loop-free imperative programs, addressing the practical gap between formal methods and software engineering.

AINeutralarXiv – CS AI · May 296/10
🧠

Neural Network Verification using Partial Multi-Neuron Relaxation

Researchers present a novel neural network verification method called partial multi-neuron relaxation that selectively applies computationally expensive multi-neuron bounds to strategically chosen neurons rather than all neurons. This approach balances the tightness-scalability tradeoff in formal verification, showing improved performance when integrated into the Marabou verifier.

AIBullisharXiv – CS AI · May 296/10
🧠

Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning

Researchers introduce Hilbert-Geo, a neural-symbolic AI framework for solving solid geometry problems by combining formal language representation with theorem-based reasoning. The system achieves 77.3% accuracy on solid geometry tasks, significantly outperforming leading AI models like GPT-4 and Gemini-2.5-pro, demonstrating advances in multimodal geometric reasoning.

🧠 GPT-5🧠 Gemini
AINeutralarXiv – CS AI · May 286/10
🧠

An LLM-Based Assistance System for Intuitive and Flexible Capability-Based Planning

Researchers developed a hybrid system combining formal symbolic planning with large language models to improve capability-based planning in industrial automation. The system integrates natural-language interaction, explainability, and human-approved knowledge model adaptation, achieving high accuracy across planning and query tasks while maintaining formal correctness guarantees.

AINeutralarXiv – CS AI · May 286/10
🧠

The Computational Boundary of Inference: Capability Internalization, Training, and the Turing Jump

A new computability theory paper proves that finite internal self-modification in AI systems cannot exceed their existing computational layer, while qualitatively stronger capabilities require access to a higher computational level (the Turing jump). This formally separates recursive self-improvement narratives into within-layer iteration versus genuine capability ascent, constraining theoretical claims about AI recursive self-improvement.

AINeutralarXiv – CS AI · May 275/10
🧠

2-ASP(Q) programs with weak constraints: Complexity and efficient implementation

Researchers present 2-ASP(Q)^w, a fragment of Answer Set Programming extended with quantifiers and weak constraints, proving its theoretical complexity bounds and introducing practical computation strategies using CEGAR techniques. The work bridges theoretical computer science with implementable solutions for optimization problems, offering both formal completeness results and experimental validation on real-world benchmarks.

AINeutralarXiv – CS AI · May 276/10
🧠

Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version)

Researchers propose a formal framework for describing knowledge graph affordances to agents, extending decades-old semantic web service standards to address modern KG discovery and composition challenges. The framework introduces the Agentic Affordance Profile (AAP), a metadata layer that enables principled selection and failure diagnosis by specifying what agents can prove from a knowledge graph and under what epistemic conditions.

AINeutralarXiv – CS AI · May 125/10
🧠

Functional Stable Model Semantics and Answer Set Programming Modulo Theories

Researchers demonstrate how functional stable model semantics enhances Answer Set Programming Modulo Theories (ASPMT), enabling integration of intensional functions that derive values from other predicates rather than pre-defined sources. The framework allows tight ASPMT programs to translate into SMT instances, extending the theoretical foundations of logic programming.

AINeutralarXiv – CS AI · May 125/10
🧠

Cplus2ASP: Computing Action Language C+ in Answer Set Programming

Cplus2ASP Version 2 is a new system that translates action language C+ into answer set programming, offering significant performance improvements over the Causal Calculator through modern ASP solving techniques. The tool supports incremental execution, external atoms via Lua integration, and extensible translations for other action languages, making it relevant for automated reasoning and planning applications.

AINeutralarXiv – CS AI · Apr 206/10
🧠

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Researchers introduce DPrivBench, a benchmark for evaluating how well large language models can reason about differential privacy algorithms and verify their correctness. Testing shows current LLMs handle basic DP mechanisms competently but fail significantly on advanced algorithms, exposing critical gaps in automated privacy reasoning capabilities.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Modeling Co-Pilots for Text-to-Model Translation

Researchers introduce Text2Model and Text2Zinc, frameworks that use large language models to translate natural language descriptions into formal optimization and satisfaction models. The work represents the first unified approach combining both problem types with a solver-agnostic architecture, though experiments reveal LLMs remain imperfect at this task despite showing competitive performance.

AI × CryptoBullisharXiv – CS AI · Apr 136/10
🤖

SPEAR: An Engineering Case Study of Multi-Agent Coordination for Smart Contract Auditing

SPEAR is a multi-agent AI framework designed to automate smart contract auditing through coordinated specialist agents that prioritize contracts, allocate tasks, and recover from failures autonomously. The research demonstrates how established multi-agent system patterns can improve security analysis workflows beyond centralized or pipeline-based approaches.

AIBullisharXiv – CS AI · Mar 126/10
🧠

FAME: Formal Abstract Minimal Explanation for Neural Networks

Researchers introduce FAME (Formal Abstract Minimal Explanations), a new method for explaining neural network decisions that scales to large networks while producing smaller explanations. The approach uses abstract interpretation and dedicated perturbation domains to eliminate irrelevant features and converge to minimal explanations more efficiently than existing methods.