#trustworthy-ai News & Analysis

52 articles tagged with #trustworthy-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

52 articles

AINeutralarXiv – CS AI · Jun 56/10

🧠

Beyond Soft Masks: Hard-Perturbation Mixup Explainer for Robust GNN Explainability

Researchers propose HPME, a novel framework for explaining Graph Neural Network decisions using hard-perturbation mixup strategies instead of soft masks. The method addresses out-of-distribution issues in GNN explainability by extracting discrete subgraphs and employing structure-level replacement, achieving improved explanation fidelity across synthetic and real-world datasets.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

Researchers introduce the Triangulated Preference Shift score, an automated metric that identifies lexical biases introduced during preference learning stages (like RLHF) in large language models without requiring manual curation. The metric isolates language pattern shifts across six model families, revealing that preference tuning may push models toward a 'language of prestige' that diverges from natural human language usage.

AIBullisharXiv – CS AI · Jun 26/10

🧠

Domain-Shift-Aware Conformal Prediction for Large Language Models

Researchers propose Domain-Shift-Aware Conformal Prediction (DS-CP), a framework that improves reliability of large language model outputs by adapting conformal prediction methods to handle domain shift. The approach reweights calibration samples based on proximity to test prompts, delivering more reliable uncertainty quantification and reducing hallucinations in real-world deployments.

AINeutralarXiv – CS AI · Jun 16/10

🧠

dashi: A Python library for Dataset Shift Characterization to Support Trustworthy AI Development and Deployment

Researchers introduce dashi, an open-source Python library that detects and analyzes dataset shifts—changes between training and test data distributions—which can degrade AI model performance. The tool combines unsupervised statistical methods with supervised performance analysis to help developers identify data quality issues across temporal and multi-source environments, particularly relevant for high-stakes applications like healthcare AI.

AINeutralarXiv – CS AI · May 286/10

🧠

REC-CBM: Rubric-Aware Error-Correction Concept Bottleneck Models for Trustworthy Open-Ended Grading

Researchers propose REC-CBM, a novel machine learning model that combines concept bottleneck models with rubric-aware error correction to automate open-ended educational grading while maintaining transparency and interpretability. Unlike black-box LLM systems, REC-CBM allows educators to verify scoring decisions through human-interpretable concept reasoning, addressing the growing need for trustworthy automated grading in educational settings.

AINeutralarXiv – CS AI · May 276/10

🧠

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning

Researchers introduce LexGuard, an adversarial AI framework that improves legal reasoning in large language models by distinguishing legally relevant changes from irrelevant perturbations. The system uses formal logic and SMT solvers to ground legal decisions in statute interpretation, addressing systematic failures in existing legal AI systems to maintain appropriate sensitivity to material legal facts.

AIBullisharXiv – CS AI · May 276/10

🧠

ReasonOps: A Unified Operational Paradigm for Trustworthy Verified LLM Reasoning

Researchers introduce ReasonOps, a unified operational framework that treats AI reasoning as a continuously monitored and verifiable process rather than isolated inference. The paradigm integrates formal verification, symbolic reasoning, and runtime assurance to address critical reliability gaps in LLM-based reasoning systems, particularly for safety-critical applications.

AINeutralarXiv – CS AI · May 126/10

🧠

Reinforcement Learning for Scalable and Trustworthy Intelligent Systems

A dissertation presents research on scaling reinforcement learning across distributed systems while ensuring trustworthy behavior in AI applications. The work addresses communication efficiency in federated settings and alignment with human preferences in large language models, proposing that next-generation intelligent systems require both optimization efficiency and safety mechanisms.

AINeutralarXiv – CS AI · May 96/10

🧠

Safactory: A Scalable Agent Factory for Trustworthy Autonomous Intelligence

Safactory is a new framework that integrates simulation, data management, and reinforcement learning to develop trustworthy autonomous AI agents. The system addresses fragmentation in existing agent infrastructure by creating a unified pipeline for continuous improvement and risk detection in long-horizon decision-making tasks.

AINeutralarXiv – CS AI · May 16/10

🧠

Efficient Preimage Approximation for Neural Network Certification

Researchers introduce PREMAP2, an advanced neural network certification tool that significantly improves scalability and efficiency for verifying AI model robustness. The method extends beyond worst-case analysis by estimating what proportion of inputs satisfy safety specifications, with new capabilities supporting convolutional networks and real-world adversarial scenarios like patch attacks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms

A comprehensive review examines explainable AI methods for human activity recognition (HAR) systems across wearable, ambient, and physiological sensors. The paper addresses the critical gap between deep learning's performance improvements and the opacity that limits real-world deployment, proposing a unified framework for understanding XAI mechanisms in HAR applications.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment

Researchers propose a human-centered framework for evaluating whether AI systems fail in ways similar to humans by measuring out-of-distribution performance across a spectrum of perceptual difficulty rather than arbitrary distortion levels. Testing this approach on vision models reveals that vision-language models show the most consistent human alignment, while CNNs and ViTs demonstrate regime-dependent performance differences depending on task difficulty.

AIBullisharXiv – CS AI · Mar 276/10

🧠

ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

Researchers have introduced ElephantBroker, an open-source cognitive runtime system that combines knowledge graphs with vector storage to create more trustworthy AI agents with verifiable memory. The system implements comprehensive safety measures, evidence verification, and multi-organizational access controls for enterprise AI deployments.

AIBullisharXiv – CS AI · Mar 266/10

🧠

Learning To Guide Human Decision Makers With Vision-Language Models

Researchers introduce Learning to Guide (LTG), a new AI framework where machines provide interpretable guidance to human decision-makers rather than making automated decisions. The SLOG approach transforms vision-language models into guidance generators using human feedback, showing promise in medical diagnosis applications.

AIBearisharXiv – CS AI · Mar 176/10

🧠

Do Metrics for Counterfactual Explanations Align with User Perception?

A new study reveals that standard algorithmic metrics used to evaluate AI counterfactual explanations poorly correlate with human perceptions of explanation quality. The research found weak and dataset-dependent relationships between technical metrics and user judgments, highlighting fundamental limitations in current AI explainability evaluation methods.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Concisely Explaining the Doubt: Minimum-Size Abductive Explanations for Linear Models with a Reject Option

Researchers developed a method to compute minimum-size abductive explanations for AI linear models with reject options, addressing a key challenge in explainable AI for critical domains. The approach uses log-linear algorithms for accepted instances and integer linear programming for rejected instances, proving more efficient than existing methods despite theoretical NP-hardness.

AIBearisharXiv – CS AI · Mar 176/10

🧠

The Scenic Route to Deception: Dark Patterns and Explainability Pitfalls in Conversational Navigation

Researchers warn that AI-powered conversational navigation systems using Large Language Models could transform route guidance from verifiable geometric tasks into manipulative dialogues. The study proposes a framework categorizing risks as dark patterns or explainability pitfalls, suggesting neuro-symbolic architectures to maintain trustworthiness.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models

Researchers propose integrating causal methods into machine learning systems to balance competing objectives like fairness, privacy, robustness, accuracy, and explainability. The paper argues that addressing these principles in isolation leads to conflicts and suboptimal solutions, while causal approaches can help navigate trade-offs in both trustworthy ML and foundation models.

AIBullisharXiv – CS AI · Mar 126/10

🧠

CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model

Researchers introduce CUPID, a plug-in framework that estimates both aleatoric and epistemic uncertainty in deep learning models without requiring model retraining. The modular approach can be inserted into any layer of pretrained networks and provides interpretable uncertainty analysis for high-stakes AI applications.

AINeutralarXiv – CS AI · Mar 37/109

🧠

Property-Driven Evaluation of GNN Expressiveness at Scale: Datasets, Framework, and Study

Researchers developed a comprehensive evaluation framework for Graph Neural Networks (GNNs) using formal specification methods, creating 336 new datasets to test GNN expressiveness across 16 fundamental graph properties. The study reveals that no single pooling approach consistently performs well across all properties, with attention-based pooling excelling in generalization while second-order pooling provides better sensitivity.

AIBullisharXiv – CS AI · Mar 37/1010

🧠

Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision

Researchers developed a new inference-time safety mechanism for code-generating AI models that uses retrieval-augmented generation to identify and fix security vulnerabilities in real-time. The approach leverages Stack Overflow discussions to guide AI code revision without requiring model retraining, improving security while maintaining interpretability.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Calibrating Verbalized Confidence with Self-Generated Distractors

Researchers introduce DINCO (Distractor-Normalized Coherence), a method to improve confidence calibration in large language models by using self-generated alternative claims to reduce overconfidence bias. The approach addresses LLM suggestibility issues that cause models to express high confidence on low-accuracy outputs, potentially improving AI safety and trustworthiness.

AIBullisharXiv – CS AI · Mar 27/1024

🧠

DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher

Researchers propose DUET, a new distillation-based method for LLM unlearning that removes undesirable knowledge from AI models without full retraining. The technique combines computational efficiency with security advantages, achieving better performance in both knowledge removal and utility preservation while being significantly more data-efficient than existing methods.

AIBullishOpenAI News · Dec 36/105

🧠

How confessions can keep language models honest

OpenAI researchers are developing a 'confessions' method to train AI language models to acknowledge their mistakes and undesirable behavior. This approach aims to enhance AI honesty, transparency, and overall trustworthiness in model outputs.

AINeutralarXiv – CS AI · Mar 275/10

🧠

A Unified Memory Perspective for Probabilistic Trustworthy AI

Researchers present a unified framework for probabilistic AI computation that treats deterministic and stochastic data access under a common perspective. The study identifies memory systems as performance bottlenecks in trustworthy AI and proposes compute-in-memory approaches to address scalability challenges.

← PrevPage 2 of 3Next →