y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#trustworthy-ai News & Analysis

37 articles tagged with #trustworthy-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

37 articles
AINeutralarXiv – CS AI · Mar 176/10
🧠

Concisely Explaining the Doubt: Minimum-Size Abductive Explanations for Linear Models with a Reject Option

Researchers developed a method to compute minimum-size abductive explanations for AI linear models with reject options, addressing a key challenge in explainable AI for critical domains. The approach uses log-linear algorithms for accepted instances and integer linear programming for rejected instances, proving more efficient than existing methods despite theoretical NP-hardness.

AINeutralarXiv – CS AI · Mar 166/10
🧠

Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models

Researchers propose integrating causal methods into machine learning systems to balance competing objectives like fairness, privacy, robustness, accuracy, and explainability. The paper argues that addressing these principles in isolation leads to conflicts and suboptimal solutions, while causal approaches can help navigate trade-offs in both trustworthy ML and foundation models.

AINeutralarXiv – CS AI · Mar 37/109
🧠

Property-Driven Evaluation of GNN Expressiveness at Scale: Datasets, Framework, and Study

Researchers developed a comprehensive evaluation framework for Graph Neural Networks (GNNs) using formal specification methods, creating 336 new datasets to test GNN expressiveness across 16 fundamental graph properties. The study reveals that no single pooling approach consistently performs well across all properties, with attention-based pooling excelling in generalization while second-order pooling provides better sensitivity.

AIBullisharXiv – CS AI · Mar 37/1010
🧠

Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision

Researchers developed a new inference-time safety mechanism for code-generating AI models that uses retrieval-augmented generation to identify and fix security vulnerabilities in real-time. The approach leverages Stack Overflow discussions to guide AI code revision without requiring model retraining, improving security while maintaining interpretability.

AIBullisharXiv – CS AI · Mar 36/103
🧠

Calibrating Verbalized Confidence with Self-Generated Distractors

Researchers introduce DINCO (Distractor-Normalized Coherence), a method to improve confidence calibration in large language models by using self-generated alternative claims to reduce overconfidence bias. The approach addresses LLM suggestibility issues that cause models to express high confidence on low-accuracy outputs, potentially improving AI safety and trustworthiness.

AIBullisharXiv – CS AI · Mar 27/1024
🧠

DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher

Researchers propose DUET, a new distillation-based method for LLM unlearning that removes undesirable knowledge from AI models without full retraining. The technique combines computational efficiency with security advantages, achieving better performance in both knowledge removal and utility preservation while being significantly more data-efficient than existing methods.

AIBullishOpenAI News · Dec 36/105
🧠

How confessions can keep language models honest

OpenAI researchers are developing a 'confessions' method to train AI language models to acknowledge their mistakes and undesirable behavior. This approach aims to enhance AI honesty, transparency, and overall trustworthiness in model outputs.

AINeutralarXiv – CS AI · Mar 275/10
🧠

A Unified Memory Perspective for Probabilistic Trustworthy AI

Researchers present a unified framework for probabilistic AI computation that treats deterministic and stochastic data access under a common perspective. The study identifies memory systems as performance bottlenecks in trustworthy AI and proposes compute-in-memory approaches to address scalability challenges.

AINeutralOpenAI News · Jul 154/104
🧠

Intellectual freedom by design

ChatGPT is positioned as a versatile AI tool designed with three core principles: usefulness, trustworthiness, and adaptability. The design philosophy emphasizes user customization and intellectual freedom in how the AI system can be utilized.

← PrevPage 2 of 2