#trustworthy-ai News & Analysis

52 articles tagged with #trustworthy-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

52 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

EnTrust: Modeling Inter-Modal Conflict for Trustworthy Multimodal Medical Image Analysis

EnTrust is a new framework for multimodal medical image analysis that treats disagreement between imaging modalities as a direct source of predictive uncertainty rather than averaging it away. The approach combines feature decomposition, diffusion-based segmentation, and calibrated uncertainty estimation to help clinicians understand not just where predictions are uncertain, but why, achieving state-of-the-art accuracy across multiple medical imaging domains.

AINeutralarXiv – CS AI · Jun 197/10

🧠

A Systematic Evaluation of Black-Box Uncertainty Estimation Methods for Large Language Models

Researchers present a comprehensive evaluation framework for black-box uncertainty estimation methods in large language models, benchmarking 24 methods across 4 models and datasets. The study reveals that no single approach dominates universally, but hybrid methods combining multiple uncertainty signals and candidate-reasoning approaches consistently outperform others, addressing critical gaps in trustworthy LLM deployment.

AIBullisharXiv – CS AI · Jun 27/10

🧠

AXIOM: A Trust-First Neuro-Symbolic Execution Architecture for Verifiable Mathematical Reasoning

AXIOM is a neuro-symbolic architecture that pairs language models with deterministic computer algebra systems to solve mathematical problems with verifiable correctness. The system achieves 94.36% accuracy on MATH benchmarks with 100% confidence (zero incorrect confident answers) and has processed ~30,000 production queries, establishing a framework for trustworthy AI systems that prioritize verifiability over raw performance.

AIBullisharXiv – CS AI · May 277/10

🧠

Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)

Researchers present a hybrid neuro-symbolic architecture that combines formal logic with neural semantic analysis to verify LLM outputs in high-stakes domains like healthcare. The system achieves over 83% hallucination detection rates for structured data and 72% for semantic fabrications while reducing report creation time by 30%, demonstrating practical safeguards for deploying LLMs in data-sensitive applications.

AIBullisharXiv – CS AI · May 127/10

🧠

Deep Arguing

Researchers introduce Deep Arguing, a neurosymbolic method that combines deep learning with argumentation reasoning to create interpretable AI classification models. The approach constructs argumentative structures where data points support or attack predictions, enabling end-to-end learning while providing human-understandable explanations for model decisions.

AIBullisharXiv – CS AI · May 127/10

🧠

FairHealth: An Open-Source Python Library for Trustworthy Healthcare AI in Low-Resource Settings

FairHealth is an open-source Python library designed to address critical gaps in healthcare AI for low-resource settings, particularly in low-income countries. The toolkit integrates fairness auditing, privacy-preserving federated learning, explainability tools, and Global South datasets into a unified framework, making trustworthy AI more accessible to underserved healthcare systems.

AINeutralarXiv – CS AI · May 17/10

🧠

Hypnopaedia-Aware Machine Unlearning via Psychometrics of Artificial Mental Imagery

Researchers propose a machine unlearning framework to detect and remove neural backdoors—hidden triggers inserted during AI training that can compromise system integrity. Using model inversion and statistical analysis, the approach identifies malicious patterns and autonomously detaches machine behavior from backdoor triggers, addressing a critical cybersecurity vulnerability in AI systems.

AINeutralarXiv – CS AI · Apr 207/10

🧠

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures

A new survey examines intrinsic interpretability approaches for Large Language Models, categorizing design methods that build transparency directly into model architectures rather than applying post-hoc explanations. The research identifies five key paradigms—functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction—addressing the critical challenge of making LLMs more trustworthy and safer for deployment.

AIBullisharXiv – CS AI · Apr 157/10

🧠

RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

Researchers introduce RePAIR, a framework enabling users to instruct large language models to forget harmful knowledge, misinformation, and personal data through natural language prompts at inference time. The system uses a training-free method called STAMP that manipulates model activations to achieve selective unlearning with minimal computational overhead, outperforming existing approaches while preserving model utility.

AIBullisharXiv – CS AI · Apr 157/10

🧠

A Two-Stage LLM Framework for Accessible and Verified XAI Explanations

Researchers propose a two-stage LLM framework that uses one model to translate XAI technical outputs into natural language and a second model to verify accuracy, faithfulness, and completeness before delivering explanations to users. The framework includes iterative refinement mechanisms and demonstrates improved reliability across multiple XAI techniques and LLM families.

AI × CryptoBullisharXiv – CS AI · Apr 77/10

🤖

Quantifying Trust: Financial Risk Management for Trustworthy AI Agents

Researchers introduce the Agentic Risk Standard (ARS), a payment settlement framework for AI-mediated transactions that provides contractual compensation for agent failures. The standard shifts trust from implicit model behavior expectations to explicit, measurable guarantees through financial risk management principles.

AIBullisharXiv – CS AI · Mar 277/10

🧠

Decidable By Construction: Design-Time Verification for Trustworthy AI

Researchers propose a framework for verifying AI model properties at design time rather than after deployment, using algebraic constraints over finitely generated abelian groups. The approach eliminates computational overhead of post-hoc verification by building trustworthiness into the model architecture from the start.

AINeutralarXiv – CS AI · Mar 177/10

🧠

Efficient Federated Conformal Prediction with Group-Conditional Guarantee

Researchers propose group-conditional federated conformal prediction (GC-FCP), a new protocol that enables trustworthy AI uncertainty quantification across distributed clients while providing coverage guarantees for specific groups. The framework addresses challenges in federated learning for applications in healthcare, finance, and mobile sensing by creating compact weighted summaries that support efficient calibration.

AINeutralarXiv – CS AI · Mar 177/10

🧠

How Meta-research Can Pave the Road Towards Trustworthy AI In Healthcare: Catalogue of Ideas and Roadmap for Future Research

Researchers convened a February 2025 workshop to explore how meta-research methodologies can enhance Trustworthy AI (TAI) implementation in healthcare. The study identifies key challenges including robustness, reproducibility, clinical integration, and transparency gaps, proposing a roadmap for interdisciplinary collaboration between TAI and meta-research fields.

AIBullisharXiv – CS AI · Mar 97/10

🧠

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

Researchers introduce RAG-Driver, a retrieval-augmented multi-modal large language model designed for autonomous driving that can provide explainable decisions and control predictions. The system addresses data scarcity and generalization challenges in AI-driven autonomous vehicles by using in-context learning and expert demonstration retrieval.

AINeutralarXiv – CS AI · Mar 56/10

🧠

From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Learning Through the Lens of Trust Report 2.0

Researchers propose Trustworthy Federated Learning (TFL) framework that treats trust as a continuously maintained system condition rather than static property, addressing challenges in AI systems with autonomous decision-making. The framework introduces Trust Report 2.0 as a privacy-preserving coordination blueprint for multi-stakeholder governance in federated learning deployments.

AINeutralarXiv – CS AI · Mar 46/105

🧠

Architecting Trust in Artificial Epistemic Agents

Researchers propose a framework for developing trustworthy AI agents that function as epistemic entities, capable of pursuing knowledge goals and shaping information environments. The paper argues that as AI models increasingly replace traditional search methods and provide specialized advice, their calibration to human epistemic norms becomes critical to prevent cognitive deskilling and epistemic drift.

AIBullishFortune Crypto · Mar 27/10

🧠

Why Europe can lead in trusted, industrialized AI

Europe is positioning itself to lead in trustworthy, regulated AI by leveraging its regulatory frameworks and sovereign data control as competitive advantages. As AI evolves from conversational tools to autonomous agents, Europe's emphasis on trust and industrialization could unlock significant economic value and create a differentiated market position against competitors.

AINeutralarXiv – CS AI · Jun 256/10

🧠

Stabilizing black-box algorithms through task-oriented randomization

Researchers present a task-oriented randomization methodology to stabilize black-box algorithms while accommodating diverse input data structures, with extensions to large language models and top-k ranking problems. The framework provides theoretical stability guarantees and analyzes the fundamental trade-off between stability and exploration, validated through numerical simulations and real-world datasets.

AINeutralarXiv – CS AI · Jun 236/10

🧠

"I Said Things I Needed to Hear Myself": Peer Support as an Emotional, Organisational, and Sociotechnical Practice in Singapore

A qualitative study of 20 peer supporters in Singapore examines how digital platforms mediate mental health support outside clinical systems. The research identifies design opportunities for culturally responsive AI tools that enhance rather than replace human connection in peer support contexts.

AINeutralarXiv – CS AI · Jun 195/10

🧠

Confidence-Aware Automated Assessment of Student-Drawn Scientific Models

Researchers developed an automated Vision Transformer-based system to score student-drawn scientific models, addressing the costly manual assessment burden in science education. The confidence-aware framework selectively automates scoring of high-confidence submissions while deferring uncertain cases to human reviewers, demonstrating improved reliability across NGSS-aligned assessments.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Toward Trustworthy AI: Multi-Target Adversarial Attacks and Robust Defenses for Continuous Data Summarization

Researchers propose methods to attack and defend continuous data summarization systems by exploiting vulnerabilities in similarity-based perturbations through DR-submodular optimization. The work demonstrates that adversarial attacks on upstream data processing can compromise trustworthy AI pipelines and proposes defense mechanisms with theoretical guarantees.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Neurosymbolic Learning for Inference-Time Argumentation

Researchers introduce Inference-Time Argumentation (ITA), a neurosymbolic framework that combines large language models with formal argumentation semantics for claim verification. The system generates arguments, scores them, and produces ternary (true/false/uncertain) predictions with faithful, inspectable reasoning structures rather than post-hoc justifications.

AINeutralarXiv – CS AI · Jun 86/10

🧠

An Abstract Architecture for Explainable Autonomy in Hazardous Environments

Researchers present an abstract architecture for building autonomous robotic systems that can explain their decision-making processes to human operators and regulators. The framework addresses the critical need for explainability in autonomous systems deployed in hazardous environments, with a practical application example in nuclear industry operations where trust and regulatory compliance are essential.

AINeutralarXiv – CS AI · Jun 86/10

🧠

TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning

Researchers introduce TRUE (Trustworthy Unified Explanation Framework), a new methodology for interpreting and verifying the reasoning processes of large language models across multiple analytical levels. The framework combines executable verification, structural analysis, and causal failure mode detection to provide transparent insights into LLM decision-making, addressing critical gaps in current interpretability methods.

Page 1 of 3Next →