AI-Driven Test Case Generation from Natural Language Requirements: A Survey of Techniques and Research Gaps
A comprehensive survey of AI and NLP techniques for automating test case generation from natural language requirements identifies 21 primary studies across three evolutionary eras. The research reveals that no existing approach fully addresses six critical quality dimensions—automation, ambiguity handling, domain applicability, traceability, evaluation thoroughness, and hallucination control—highlighting significant gaps in current software testing automation.
This systematic review examines a critical intersection between artificial intelligence and software engineering: automating test case generation from natural language requirements. The work directly addresses a persistent industry pain point—testing consumes substantial development time and resources—by surveying how recent advances in LLMs and NLP can streamline this process. The authors' three-era evolutionary framework reveals the progression from earlier rule-based approaches to modern AI-driven solutions, providing clear historical context for understanding current capabilities and limitations.
The core finding that no approach simultaneously satisfies all six quality dimensions has significant implications for software development teams. While automation and basic NLP techniques have matured, the survey highlights persistent vulnerabilities: LLM hallucination introduces false test cases, reduced traceability complicates defect root-cause analysis, and inconsistent evaluation metrics make it difficult to compare different tools. These technical challenges directly impact deployment risk, particularly in safety-critical domains where test coverage verification is non-negotiable.
For the development industry, this research validates both the promise and peril of applying generative AI to quality assurance. Organizations exploring AI-driven testing tools should recognize that current solutions require human oversight and domain expertise despite their automation capabilities. The survey's identification of four actionable research guidelines—targeting hallucination mitigation, traceability mechanisms, complexity sensitivity, and compliance frameworks—provides a roadmap for tool developers and researchers. As enterprises increasingly adopt LLM-based development assistance, this work establishes evidence-based expectations for what automated testing solutions can realistically deliver.
- →No existing AI approach for test case generation fully addresses automation, ambiguity handling, domain applicability, traceability, evaluation quality, and hallucination control simultaneously.
- →LLM hallucination and reduced traceability remain critical technical barriers to enterprise adoption of AI-driven testing automation.
- →The field shows clear evolutionary progress across three eras, with modern LLM approaches offering greater automation but introducing new quality assurance challenges.
- →Current evaluation methodologies lack standardization, making it difficult to objectively compare different test generation tools and frameworks.
- →Development teams should view AI-driven testing as a productivity enhancement rather than a fully autonomous solution requiring reduced human verification oversight.