AINeutralarXiv โ CS AI ยท 1d ago6/10
๐ง
LLMORPH: Automated Metamorphic Testing of Large Language Models
Researchers have developed LLMORPH, an automated testing tool for Large Language Models that uses Metamorphic Testing to identify faulty behaviors without requiring human-labeled data. The tool was tested on GPT-4, LLAMA3, and HERMES 2 across four NLP benchmarks, generating over 561,000 test executions and successfully exposing model inconsistencies.
๐ง GPT-4