#metamorphic-testing News & Analysis

7 articles tagged with #metamorphic-testing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles

AIBearisharXiv – CS AI · Jun 97/10

🧠

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs

Researchers introduce LGMT, a novel testing framework that uses first-order logic to evaluate Large Language Models' reasoning reliability by creating logically equivalent test cases. The study reveals that state-of-the-art LLMs fail consistency checks under semantic transformations, exposing hidden reasoning defects that traditional benchmarks miss.

AINeutralarXiv – CS AI · Mar 167/10

🧠

Semantic Invariance in Agentic AI

Researchers developed a testing framework to evaluate how reliably AI agents maintain consistent reasoning when inputs are semantically equivalent but differently phrased. Their study of seven foundation models across 19 reasoning problems found that larger models aren't necessarily more robust, with the smaller Qwen3-30B-A3B achieving the highest stability at 79.6% invariant responses.

AIBullisharXiv – CS AI · Mar 57/10

🧠

An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software

Researchers developed a multi-agent LLM system that translates legal statutes into executable software, using U.S. tax preparation as a test case. The system achieved a 45% success rate using GPT-4o-mini, significantly outperforming larger frontier models like GPT-4o and Claude 3.5 which only achieved 9-15% success rates on complex tax code tasks.

🧠 GPT-4🧠 Claude

AINeutralarXiv – CS AI · Jun 56/10

🧠

Metamorphic Testing with the Rashomon Set: Explanation Faithfulness in Machine Learning

Researchers propose a metamorphic testing framework to evaluate the trustworthiness of machine learning model explanations by identifying inconsistencies between model predictions and feature attributions, addressing the Rashomon effect where multiple models achieve similar performance but yield conflicting explanations.

AINeutralarXiv – CS AI · May 286/10

🧠

Multi-Agent LLM-based Metamorphic Testing for REST APIs

Researchers present ARMeta, an LLM-based multi-agent tool that automates metamorphic testing for REST APIs by identifying test scenarios and generating executable tests without requiring explicit correct outputs. The approach addresses the test oracle problem in API validation and demonstrates complementary capabilities to traditional scenario-based testing methods.

AINeutralarXiv – CS AI · Mar 266/10

🧠

LLMORPH: Automated Metamorphic Testing of Large Language Models

Researchers have developed LLMORPH, an automated testing tool for Large Language Models that uses Metamorphic Testing to identify faulty behaviors without requiring human-labeled data. The tool was tested on GPT-4, LLAMA3, and HERMES 2 across four NLP benchmarks, generating over 561,000 test executions and successfully exposing model inconsistencies.

🧠 GPT-4

AINeutralarXiv – CS AI · Mar 275/10

🧠

From Untestable to Testable: Metamorphic Testing in the Age of LLMs

A research paper introduces metamorphic testing as a solution for testing AI and LLM-integrated software systems. The approach addresses the challenge of unreliable LLM outputs and limited labeled ground truth by using relationships between multiple test executions as test oracles.