y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#software-testing News & Analysis

7 articles tagged with #software-testing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles
AIBullisharXiv โ€“ CS AI ยท Feb 277/106
๐Ÿง 

Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy

Researchers introduce VALTEST, a framework that uses semantic entropy to automatically validate test cases generated by Large Language Models, addressing the problem of invalid or hallucinated tests that mislead AI programming agents. The system improves test validity by up to 29% and enhances code generation performance through better filtering of LLM-generated test cases.

AINeutralarXiv โ€“ CS AI ยท 6d ago6/10
๐Ÿง 

Multi-modal user interface control detection using cross-attention

Researchers have developed an enhanced version of YOLOv5 that combines visual and textual data through cross-attention mechanisms to improve UI control detection in software screenshots. Tested on over 16,000 annotated images across 23 control classes, the multi-modal approach significantly outperforms pixel-only detection, with convolutional fusion showing the strongest results for semantically complex elements.

AINeutralarXiv โ€“ CS AI ยท Apr 66/10
๐Ÿง 

GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

Researchers introduced GBQA, a new benchmark with 30 games and 124 verified bugs to test whether large language models can autonomously discover software bugs. The best-performing model, Claude-4.6-Opus, only identified 48.39% of bugs, highlighting the significant challenges in autonomous bug detection.

๐Ÿง  Claude
AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

LSPRAG: LSP-Guided RAG for Language-Agnostic Real-Time Unit Test Generation

Researchers developed LSPRAG, a new framework that uses Language Server Protocol backends to help Large Language Models generate unit tests across multiple programming languages in real-time. The system achieved significant improvements in test coverage, with increases up to 213% for Java, 174% for Go, and 31% for Python compared to existing methods.

AINeutralarXiv โ€“ CS AI ยท Mar 27/1019
๐Ÿง 

Once4All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators

Researchers developed Once4All, an LLM-assisted fuzzing framework for testing SMT solvers that addresses syntax validity issues and computational overhead. The system found 43 confirmed bugs in leading solvers Z3 and cvc5, with 40 already fixed by developers.

AINeutralarXiv โ€“ CS AI ยท Mar 275/10
๐Ÿง 

From Untestable to Testable: Metamorphic Testing in the Age of LLMs

A research paper introduces metamorphic testing as a solution for testing AI and LLM-integrated software systems. The approach addresses the challenge of unreliable LLM outputs and limited labeled ground truth by using relationships between multiple test executions as test oracles.

AINeutralarXiv โ€“ CS AI ยท Mar 33/104
๐Ÿง 

Test Case Prioritization: A Snowballing Literature Review and TCPFramework with Approach Combinators

Researchers conducted a comprehensive literature review of test case prioritization (TCP) techniques and developed a new framework with ensemble methods called approach combinators. The study analyzed 324 TCP-related studies and proposed new evaluation metrics, with their methods achieving up to 2.7% reduction in regression testing time while performing comparably to state-of-the-art algorithms.