#trustworthy-ai News & Analysis

37 articles tagged with #trustworthy-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

37 articles

AIBullisharXiv – CS AI · 4d ago7/10

🧠

Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)

Researchers present a hybrid neuro-symbolic architecture that combines formal logic with neural semantic analysis to verify LLM outputs in high-stakes domains like healthcare. The system achieves over 83% hallucination detection rates for structured data and 72% for semantic fabrications while reducing report creation time by 30%, demonstrating practical safeguards for deploying LLMs in data-sensitive applications.

AIBullisharXiv – CS AI · May 127/10

🧠

FairHealth: An Open-Source Python Library for Trustworthy Healthcare AI in Low-Resource Settings

FairHealth is an open-source Python library designed to address critical gaps in healthcare AI for low-resource settings, particularly in low-income countries. The toolkit integrates fairness auditing, privacy-preserving federated learning, explainability tools, and Global South datasets into a unified framework, making trustworthy AI more accessible to underserved healthcare systems.

AIBullisharXiv – CS AI · May 127/10

🧠

Deep Arguing

Researchers introduce Deep Arguing, a neurosymbolic method that combines deep learning with argumentation reasoning to create interpretable AI classification models. The approach constructs argumentative structures where data points support or attack predictions, enabling end-to-end learning while providing human-understandable explanations for model decisions.

AINeutralarXiv – CS AI · May 17/10

🧠

Hypnopaedia-Aware Machine Unlearning via Psychometrics of Artificial Mental Imagery

Researchers propose a machine unlearning framework to detect and remove neural backdoors—hidden triggers inserted during AI training that can compromise system integrity. Using model inversion and statistical analysis, the approach identifies malicious patterns and autonomously detaches machine behavior from backdoor triggers, addressing a critical cybersecurity vulnerability in AI systems.

AINeutralarXiv – CS AI · Apr 207/10

🧠

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures

A new survey examines intrinsic interpretability approaches for Large Language Models, categorizing design methods that build transparency directly into model architectures rather than applying post-hoc explanations. The research identifies five key paradigms—functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction—addressing the critical challenge of making LLMs more trustworthy and safer for deployment.

AIBullisharXiv – CS AI · Apr 157/10

🧠

RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

Researchers introduce RePAIR, a framework enabling users to instruct large language models to forget harmful knowledge, misinformation, and personal data through natural language prompts at inference time. The system uses a training-free method called STAMP that manipulates model activations to achieve selective unlearning with minimal computational overhead, outperforming existing approaches while preserving model utility.

AIBullisharXiv – CS AI · Apr 157/10

🧠

A Two-Stage LLM Framework for Accessible and Verified XAI Explanations

Researchers propose a two-stage LLM framework that uses one model to translate XAI technical outputs into natural language and a second model to verify accuracy, faithfulness, and completeness before delivering explanations to users. The framework includes iterative refinement mechanisms and demonstrates improved reliability across multiple XAI techniques and LLM families.

AI × CryptoBullisharXiv – CS AI · Apr 77/10

🤖

Quantifying Trust: Financial Risk Management for Trustworthy AI Agents

Researchers introduce the Agentic Risk Standard (ARS), a payment settlement framework for AI-mediated transactions that provides contractual compensation for agent failures. The standard shifts trust from implicit model behavior expectations to explicit, measurable guarantees through financial risk management principles.

AIBullisharXiv – CS AI · Mar 277/10

🧠

Decidable By Construction: Design-Time Verification for Trustworthy AI

Researchers propose a framework for verifying AI model properties at design time rather than after deployment, using algebraic constraints over finitely generated abelian groups. The approach eliminates computational overhead of post-hoc verification by building trustworthiness into the model architecture from the start.

AINeutralarXiv – CS AI · Mar 177/10

🧠

How Meta-research Can Pave the Road Towards Trustworthy AI In Healthcare: Catalogue of Ideas and Roadmap for Future Research

Researchers convened a February 2025 workshop to explore how meta-research methodologies can enhance Trustworthy AI (TAI) implementation in healthcare. The study identifies key challenges including robustness, reproducibility, clinical integration, and transparency gaps, proposing a roadmap for interdisciplinary collaboration between TAI and meta-research fields.

AINeutralarXiv – CS AI · Mar 177/10

🧠

Efficient Federated Conformal Prediction with Group-Conditional Guarantee

Researchers propose group-conditional federated conformal prediction (GC-FCP), a new protocol that enables trustworthy AI uncertainty quantification across distributed clients while providing coverage guarantees for specific groups. The framework addresses challenges in federated learning for applications in healthcare, finance, and mobile sensing by creating compact weighted summaries that support efficient calibration.

AIBullisharXiv – CS AI · Mar 97/10

🧠

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

Researchers introduce RAG-Driver, a retrieval-augmented multi-modal large language model designed for autonomous driving that can provide explainable decisions and control predictions. The system addresses data scarcity and generalization challenges in AI-driven autonomous vehicles by using in-context learning and expert demonstration retrieval.

AINeutralarXiv – CS AI · Mar 56/10

🧠

From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Learning Through the Lens of Trust Report 2.0

Researchers propose Trustworthy Federated Learning (TFL) framework that treats trust as a continuously maintained system condition rather than static property, addressing challenges in AI systems with autonomous decision-making. The framework introduces Trust Report 2.0 as a privacy-preserving coordination blueprint for multi-stakeholder governance in federated learning deployments.

AINeutralarXiv – CS AI · Mar 46/105

🧠

Architecting Trust in Artificial Epistemic Agents

Researchers propose a framework for developing trustworthy AI agents that function as epistemic entities, capable of pursuing knowledge goals and shaping information environments. The paper argues that as AI models increasingly replace traditional search methods and provide specialized advice, their calibration to human epistemic norms becomes critical to prevent cognitive deskilling and epistemic drift.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

REC-CBM: Rubric-Aware Error-Correction Concept Bottleneck Models for Trustworthy Open-Ended Grading

Researchers propose REC-CBM, a novel machine learning model that combines concept bottleneck models with rubric-aware error correction to automate open-ended educational grading while maintaining transparency and interpretability. Unlike black-box LLM systems, REC-CBM allows educators to verify scoring decisions through human-interpretable concept reasoning, addressing the growing need for trustworthy automated grading in educational settings.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning

Researchers introduce LexGuard, an adversarial AI framework that improves legal reasoning in large language models by distinguishing legally relevant changes from irrelevant perturbations. The system uses formal logic and SMT solvers to ground legal decisions in statute interpretation, addressing systematic failures in existing legal AI systems to maintain appropriate sensitivity to material legal facts.

AIBullisharXiv – CS AI · 4d ago6/10

🧠

ReasonOps: A Unified Operational Paradigm for Trustworthy Verified LLM Reasoning

Researchers introduce ReasonOps, a unified operational framework that treats AI reasoning as a continuously monitored and verifiable process rather than isolated inference. The paradigm integrates formal verification, symbolic reasoning, and runtime assurance to address critical reliability gaps in LLM-based reasoning systems, particularly for safety-critical applications.

AINeutralarXiv – CS AI · May 126/10

🧠

Reinforcement Learning for Scalable and Trustworthy Intelligent Systems

A dissertation presents research on scaling reinforcement learning across distributed systems while ensuring trustworthy behavior in AI applications. The work addresses communication efficiency in federated settings and alignment with human preferences in large language models, proposing that next-generation intelligent systems require both optimization efficiency and safety mechanisms.

AINeutralarXiv – CS AI · May 96/10

🧠

Safactory: A Scalable Agent Factory for Trustworthy Autonomous Intelligence

Safactory is a new framework that integrates simulation, data management, and reinforcement learning to develop trustworthy autonomous AI agents. The system addresses fragmentation in existing agent infrastructure by creating a unified pipeline for continuous improvement and risk detection in long-horizon decision-making tasks.

AINeutralarXiv – CS AI · May 16/10

🧠

Efficient Preimage Approximation for Neural Network Certification

Researchers introduce PREMAP2, an advanced neural network certification tool that significantly improves scalability and efficiency for verifying AI model robustness. The method extends beyond worst-case analysis by estimating what proportion of inputs satisfy safety specifications, with new capabilities supporting convolutional networks and real-world adversarial scenarios like patch attacks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms

A comprehensive review examines explainable AI methods for human activity recognition (HAR) systems across wearable, ambient, and physiological sensors. The paper addresses the critical gap between deep learning's performance improvements and the opacity that limits real-world deployment, proposing a unified framework for understanding XAI mechanisms in HAR applications.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment

Researchers propose a human-centered framework for evaluating whether AI systems fail in ways similar to humans by measuring out-of-distribution performance across a spectrum of perceptual difficulty rather than arbitrary distortion levels. Testing this approach on vision models reveals that vision-language models show the most consistent human alignment, while CNNs and ViTs demonstrate regime-dependent performance differences depending on task difficulty.

AIBullisharXiv – CS AI · Mar 276/10

🧠

ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

Researchers have introduced ElephantBroker, an open-source cognitive runtime system that combines knowledge graphs with vector storage to create more trustworthy AI agents with verifiable memory. The system implements comprehensive safety measures, evidence verification, and multi-organizational access controls for enterprise AI deployments.

AIBullisharXiv – CS AI · Mar 266/10

🧠

Learning To Guide Human Decision Makers With Vision-Language Models

Researchers introduce Learning to Guide (LTG), a new AI framework where machines provide interpretable guidance to human decision-makers rather than making automated decisions. The SLOG approach transforms vision-language models into guidance generators using human feedback, showing promise in medical diagnosis applications.

AIBearisharXiv – CS AI · Mar 176/10

🧠

Do Metrics for Counterfactual Explanations Align with User Perception?

A new study reveals that standard algorithmic metrics used to evaluate AI counterfactual explanations poorly correlate with human perceptions of explanation quality. The research found weak and dataset-dependent relationships between technical metrics and user judgments, highlighting fundamental limitations in current AI explainability evaluation methods.

Page 1 of 2Next →