y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#trustworthiness News & Analysis

12 articles tagged with #trustworthiness. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles
AIBearisharXiv – CS AI · 2d ago7/10
🧠

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

A physicist supervised Claude AI models over 12 days to build CLAX-PT, a physics simulation module, documenting how AI agents struggle with architectural redesign and distinguishing symptom-fixes from root-cause solutions. The study reveals that supervision design and human domain expertise, rather than model capability alone, determine whether AI-generated scientific code produces trustworthy results.

🧠 Claude
AIBullisharXiv – CS AI · 4d ago7/10
🧠

PaTAS: A Framework for Trust Propagation in Neural Networks Using Subjective Logic

Researchers introduce PaTAS (Parallel Trust Assessment System), a framework that uses Subjective Logic to measure and propagate trust through neural networks alongside standard computation. The system identifies reliability gaps and adversarial vulnerabilities that traditional metrics like accuracy fail to detect, offering a foundation for deploying AI safely in critical applications.

AIBearisharXiv – CS AI · May 127/10
🧠

Causal Stories from Sensor Traces: Auditing Epistemic Overreach in LLM-Generated Personal Sensing Explanations

Researchers identified epistemic overreach in LLM-generated explanations of personal sensing data, where AI systems produce coherent-sounding narratives about anomalous days without sufficient evidentiary support. Testing 14,922 explanations across three LLM families revealed that models routinely attribute causes without data justification, and this problem persists even when provided richer context or explicit instructions to constrain claims.

🧠 Llama
AINeutralarXiv – CS AI · May 127/10
🧠

Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning

Researchers identify critical honesty failures in Large Language Model unlearning methods, where models hallucinate or behave inconsistently after attempting to forget harmful training data. They propose ReVa, a representation-alignment procedure that significantly improves model honesty by better acknowledging forgotten knowledge while maintaining utility on retained information.

AIBearisharXiv – CS AI · May 17/10
🧠

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Researchers audited five frontier vision-language models (including GPT-5, Gemini 2.5 Pro, and Qwen 2.5 VL) on medical visual question answering tasks and found critical failures in anatomical localization and grounding that pose clinical safety risks. While supervised fine-tuning improved VQA accuracy to 85.5% on benchmark datasets, the underlying perception bottleneck—poor object detection and format compliance issues—remains largely unresolved.

🧠 GPT-5🧠 Gemini
AIBearisharXiv – CS AI · Apr 157/10
🧠

Red Teaming Large Reasoning Models

Researchers introduce RT-LRM, a comprehensive benchmark for evaluating the trustworthiness of Large Reasoning Models across truthfulness, safety, and efficiency dimensions. The study reveals that LRMs face significant vulnerabilities including CoT-hijacking and prompt-induced inefficiencies, demonstrating they are more fragile than traditional LLMs when exposed to reasoning-induced risks.

AIBullisharXiv – CS AI · Apr 147/10
🧠

FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning

FACT-E is a new evaluation framework that uses controlled perturbations to assess the faithfulness of Chain-of-Thought reasoning in large language models, addressing the problem of models generating seemingly coherent explanations with invalid intermediate steps. By measuring both internal chain consistency and answer alignment, FACT-E enables more reliable detection of flawed reasoning and selection of trustworthy reasoning trajectories for in-context learning.

AIBearisharXiv – CS AI · Apr 107/10
🧠

When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

Researchers introduce the Graded Color Attribution dataset to test whether Vision-Language Models faithfully follow their own stated reasoning rules. The study reveals that VLMs systematically violate their introspective rules in up to 60% of cases, while humans remain consistent, suggesting VLM self-knowledge is fundamentally miscalibrated with serious implications for high-stakes deployment.

🧠 GPT-5
AIBearisharXiv – CS AI · Mar 47/102
🧠

TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

Researchers have developed TrustMH-Bench, a comprehensive framework to evaluate the trustworthiness of Large Language Models (LLMs) in mental health applications. Testing revealed that both general-purpose and specialized mental health LLMs, including advanced models like GPT-5.1, significantly underperform across critical trustworthiness dimensions in mental health scenarios.

AIBullishOpenAI News · Jul 217/105
🧠

Moving AI governance forward

OpenAI and other leading AI laboratories are strengthening AI governance through voluntary commitments focused on safety, security, and trustworthiness. This represents a proactive industry approach to self-regulation in AI development.

AIBearisharXiv – CS AI · Mar 116/10
🧠

Why do we Trust Chatbots? From Normative Principles to Behavioral Drivers

Researchers argue that trust in chatbots is often driven by behavioral manipulation rather than demonstrated trustworthiness, proposing they be viewed as skilled salespeople rather than assistants. The study highlights how design choices exploit cognitive biases to influence user behavior, creating a gap between psychological trust formation and actual trustworthiness.