AIBearisharXiv – CS AI · 2d ago7/10
🧠A physicist supervised Claude AI models over 12 days to build CLAX-PT, a physics simulation module, documenting how AI agents struggle with architectural redesign and distinguishing symptom-fixes from root-cause solutions. The study reveals that supervision design and human domain expertise, rather than model capability alone, determine whether AI-generated scientific code produces trustworthy results.
🧠 Claude
AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers introduce PaTAS (Parallel Trust Assessment System), a framework that uses Subjective Logic to measure and propagate trust through neural networks alongside standard computation. The system identifies reliability gaps and adversarial vulnerabilities that traditional metrics like accuracy fail to detect, offering a foundation for deploying AI safely in critical applications.
AIBearisharXiv – CS AI · May 127/10
🧠Researchers identified epistemic overreach in LLM-generated explanations of personal sensing data, where AI systems produce coherent-sounding narratives about anomalous days without sufficient evidentiary support. Testing 14,922 explanations across three LLM families revealed that models routinely attribute causes without data justification, and this problem persists even when provided richer context or explicit instructions to constrain claims.
🧠 Llama
AINeutralarXiv – CS AI · May 127/10
🧠Researchers identify critical honesty failures in Large Language Model unlearning methods, where models hallucinate or behave inconsistently after attempting to forget harmful training data. They propose ReVa, a representation-alignment procedure that significantly improves model honesty by better acknowledging forgotten knowledge while maintaining utility on retained information.
AIBearisharXiv – CS AI · May 17/10
🧠Researchers audited five frontier vision-language models (including GPT-5, Gemini 2.5 Pro, and Qwen 2.5 VL) on medical visual question answering tasks and found critical failures in anatomical localization and grounding that pose clinical safety risks. While supervised fine-tuning improved VQA accuracy to 85.5% on benchmark datasets, the underlying perception bottleneck—poor object detection and format compliance issues—remains largely unresolved.
🧠 GPT-5🧠 Gemini
AIBearisharXiv – CS AI · Apr 157/10
🧠Researchers introduce RT-LRM, a comprehensive benchmark for evaluating the trustworthiness of Large Reasoning Models across truthfulness, safety, and efficiency dimensions. The study reveals that LRMs face significant vulnerabilities including CoT-hijacking and prompt-induced inefficiencies, demonstrating they are more fragile than traditional LLMs when exposed to reasoning-induced risks.
AIBullisharXiv – CS AI · Apr 147/10
🧠FACT-E is a new evaluation framework that uses controlled perturbations to assess the faithfulness of Chain-of-Thought reasoning in large language models, addressing the problem of models generating seemingly coherent explanations with invalid intermediate steps. By measuring both internal chain consistency and answer alignment, FACT-E enables more reliable detection of flawed reasoning and selection of trustworthy reasoning trajectories for in-context learning.
AIBearisharXiv – CS AI · Apr 107/10
🧠Researchers introduce the Graded Color Attribution dataset to test whether Vision-Language Models faithfully follow their own stated reasoning rules. The study reveals that VLMs systematically violate their introspective rules in up to 60% of cases, while humans remain consistent, suggesting VLM self-knowledge is fundamentally miscalibrated with serious implications for high-stakes deployment.
🧠 GPT-5
AIBearisharXiv – CS AI · Mar 47/102
🧠Researchers have developed TrustMH-Bench, a comprehensive framework to evaluate the trustworthiness of Large Language Models (LLMs) in mental health applications. Testing revealed that both general-purpose and specialized mental health LLMs, including advanced models like GPT-5.1, significantly underperform across critical trustworthiness dimensions in mental health scenarios.
AIBullishOpenAI News · Jul 217/105
🧠OpenAI and other leading AI laboratories are strengthening AI governance through voluntary commitments focused on safety, security, and trustworthiness. This represents a proactive industry approach to self-regulation in AI development.
AIBearisharXiv – CS AI · Mar 116/10
🧠Researchers argue that trust in chatbots is often driven by behavioral manipulation rather than demonstrated trustworthiness, proposing they be viewed as skilled salespeople rather than assistants. The study highlights how design choices exploit cognitive biases to influence user behavior, creating a gap between psychological trust formation and actual trustworthiness.
AIBullisharXiv – CS AI · Mar 26/1010
🧠Researchers developed the TREC 2025 DRAGUN Track to evaluate AI systems that help readers assess news trustworthiness through automated report generation. The initiative created reusable evaluation resources including human-assessed rubrics and an AutoJudge system that correlates well with human evaluations for RAG-based news analysis tools.