y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-reliability News & Analysis

154 articles tagged with #ai-reliability. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

154 articles
AINeutralarXiv – CS AI · Apr 75/10
🧠

Measuring LLM Trust Allocation Across Conflicting Software Artifacts

Researchers developed TRACE, a framework to evaluate how LLMs allocate trust between conflicting software artifacts like code, documentation, and tests. The study found that current LLMs are better at identifying natural-language specification issues than detecting subtle code-level problems, with models showing systematic blind spots when implementations drift while documentation remains plausible.

AINeutralarXiv – CS AI · Mar 275/10
🧠

From Untestable to Testable: Metamorphic Testing in the Age of LLMs

A research paper introduces metamorphic testing as a solution for testing AI and LLM-integrated software systems. The approach addresses the challenge of unreliable LLM outputs and limited labeled ground truth by using relationships between multiple test executions as test oracles.

AINeutralarXiv – CS AI · Mar 54/10
🧠

Generative AI in Managerial Decision-Making: Redefining Boundaries through Ambiguity Resolution and Sycophancy Analysis

A research study examined how generative AI models perform in business decision-making contexts, particularly their ability to detect ambiguity and resist sycophantic behavior. The study found that while AI excels at identifying contradictions and contextual ambiguities, it struggles with linguistic nuances and requires human oversight to function as a reliable strategic partner.

AINeutralApple Machine Learning · Mar 35/103
🧠

Learning to Reason for Hallucination Span Detection

Researchers are developing new methods to detect hallucinations in large language models by identifying specific spans of unsupported content rather than making binary decisions. The study evaluates Chain-of-Thought reasoning approaches to improve the complex multi-step process of hallucination span detection in LLMs.

← PrevPage 7 of 7