AINeutralarXiv – CS AI · Mar 266/10
🧠
LLMORPH: Automated Metamorphic Testing of Large Language Models
Researchers have developed LLMORPH, an automated testing tool for Large Language Models that uses Metamorphic Testing to identify faulty behaviors without requiring human-labeled data. The tool was tested on GPT-4, LLAMA3, and HERMES 2 across four NLP benchmarks, generating over 561,000 test executions and successfully exposing model inconsistencies.
🧠 GPT-4