Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports
Researchers introduce a standardized taxonomy for classifying invalid bug reports and develop AI methods to automatically identify root causes and generate no-code fixes. Testing retrieval augmented generation, vanilla LLMs, and agentic web search, they achieve 66% weighted F1-score for subclassification and 68.9% success rate for fix generation, demonstrating significant potential for automating customer support workflows.
This research addresses a persistent operational challenge in software development: the substantial resources spent manually triaging invalid bug reports that require no code changes. By systematizing how invalid reports are categorized and establishing AI-driven solutions to handle them automatically, the study targets a measurable source of inefficiency in technical support organizations. The work demonstrates that machine learning approaches can meaningfully reduce the manual burden, with retrieval augmented generation showing the strongest performance for root-cause identification at 66% weighted F1-score.
The findings reveal important nuances in performance across different invalid report subcategories. Non-reproducibility cases are handled most effectively (85% F1), while Wrong Version cases remain challenging (0.00-0.29% F1), suggesting that some problem types benefit more from current AI approaches than others. For fix generation specifically, agentic web search systems achieve the highest success rate at 68.9%, outperforming both RAG and vanilla LLM approaches, indicating that external information retrieval enhances practical solution quality.
For software development organizations and customer support teams, these results offer a pathway to reduce operational friction. A 65-69% success rate on automated invalid report handling could meaningfully decrease support queue volume, allowing teams to focus on genuine engineering issues. However, the variable performance across subcategories suggests implementation requires careful consideration of which report types to automate versus escalate. The research establishes baseline metrics for this emerging capability, creating a foundation for continued improvement in developer tooling and support infrastructure.
- βRetrieval augmented generation achieves 66% weighted F1-score for invalid bug report subclassification, outperforming vanilla LLMs and web search approaches
- βAgentic web search delivers highest fix generation success rate at 68.9%, suggesting external information retrieval improves practical solution quality
- βPerformance varies significantly by subcategory, with Non-reproducibility at 85% F1 but Wrong Version remaining challenging at 0.00-0.29% F1
- βStandardized taxonomy for invalid report classification establishes benchmarks for automating customer support and reducing manual triage burden
- βCurrent AI approaches successfully handle approximately two-thirds of invalid bug report cases, enabling selective automation of support workflows