AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers developed a method to compute minimum-size abductive explanations for AI linear models with reject options, addressing a key challenge in explainable AI for critical domains. The approach uses log-linear algorithms for accepted instances and integer linear programming for rejected instances, proving more efficient than existing methods despite theoretical NP-hardness.
AIBearisharXiv – CS AI · Mar 176/10
🧠Researchers warn that AI-powered conversational navigation systems using Large Language Models could transform route guidance from verifiable geometric tasks into manipulative dialogues. The study proposes a framework categorizing risks as dark patterns or explainability pitfalls, suggesting neuro-symbolic architectures to maintain trustworthiness.
AINeutralarXiv – CS AI · Mar 166/10
🧠Researchers propose integrating causal methods into machine learning systems to balance competing objectives like fairness, privacy, robustness, accuracy, and explainability. The paper argues that addressing these principles in isolation leads to conflicts and suboptimal solutions, while causal approaches can help navigate trade-offs in both trustworthy ML and foundation models.
AIBullisharXiv – CS AI · Mar 126/10
🧠Researchers introduce CUPID, a plug-in framework that estimates both aleatoric and epistemic uncertainty in deep learning models without requiring model retraining. The modular approach can be inserted into any layer of pretrained networks and provides interpretable uncertainty analysis for high-stakes AI applications.
AINeutralarXiv – CS AI · Mar 37/109
🧠Researchers developed a comprehensive evaluation framework for Graph Neural Networks (GNNs) using formal specification methods, creating 336 new datasets to test GNN expressiveness across 16 fundamental graph properties. The study reveals that no single pooling approach consistently performs well across all properties, with attention-based pooling excelling in generalization while second-order pooling provides better sensitivity.
AIBullisharXiv – CS AI · Mar 37/1010
🧠Researchers developed a new inference-time safety mechanism for code-generating AI models that uses retrieval-augmented generation to identify and fix security vulnerabilities in real-time. The approach leverages Stack Overflow discussions to guide AI code revision without requiring model retraining, improving security while maintaining interpretability.
AIBullisharXiv – CS AI · Mar 36/103
🧠Researchers introduce DINCO (Distractor-Normalized Coherence), a method to improve confidence calibration in large language models by using self-generated alternative claims to reduce overconfidence bias. The approach addresses LLM suggestibility issues that cause models to express high confidence on low-accuracy outputs, potentially improving AI safety and trustworthiness.
AIBullisharXiv – CS AI · Mar 27/1024
🧠Researchers propose DUET, a new distillation-based method for LLM unlearning that removes undesirable knowledge from AI models without full retraining. The technique combines computational efficiency with security advantages, achieving better performance in both knowledge removal and utility preservation while being significantly more data-efficient than existing methods.
AIBullishOpenAI News · Dec 36/105
🧠OpenAI researchers are developing a 'confessions' method to train AI language models to acknowledge their mistakes and undesirable behavior. This approach aims to enhance AI honesty, transparency, and overall trustworthiness in model outputs.
AINeutralarXiv – CS AI · Mar 275/10
🧠Researchers present a unified framework for probabilistic AI computation that treats deterministic and stochastic data access under a common perspective. The study identifies memory systems as performance bottlenecks in trustworthy AI and proposes compute-in-memory approaches to address scalability challenges.
AINeutralarXiv – CS AI · Mar 54/10
🧠Researchers developed TPK, a trajectory prediction system for autonomous vehicles that integrates prior knowledge to make predictions more trustworthy and physically feasible. The system incorporates interaction and kinematic models for vehicles, pedestrians, and cyclists, improving interpretability while ensuring predictions adhere to physics.
AINeutralOpenAI News · Jul 154/104
🧠ChatGPT is positioned as a versatile AI tool designed with three core principles: usefulness, trustworthiness, and adaptability. The design philosophy emphasizes user customization and intellectual freedom in how the AI system can be utilized.