AINeutralarXiv – CS AI · 11h ago7/10
🧠Researchers introduce MENTOR, a metacognition-driven framework that addresses a critical vulnerability in Large Language Models: an average jailbreak success rate of 57.8% across domain-specific risks in education, finance, and management. The framework uses self-assessment and consequential reasoning to identify model misalignments, then applies dynamic rule-based steering to substantially reduce attack success rates, outperforming existing safety alignment methods.
AINeutralarXiv – CS AI · 2d ago7/10
🧠Researchers introduced a new benchmark for evaluating large language models' reasoning capabilities through interactive games where LLMs must query hidden environments, integrate observations, and adapt strategies. The framework reveals significant performance gaps among frontier models in both success rates and interaction efficiency, with contextual perturbations causing moderate declines but metacognitive tasks producing much larger performance drops.
AINeutralarXiv – CS AI · Apr 207/10
🧠Researchers introduced MEDLEY-BENCH, a new AI benchmark that evaluates metacognition—an AI model's ability to monitor and revise its own reasoning. The study found that while larger models evaluate their reasoning better, they don't actually control their outputs more effectively, and smaller models often match larger ones in metacognitive tasks, suggesting scale alone doesn't determine reasoning quality.
AINeutralarXiv – CS AI · Mar 267/10
🧠Researchers developed new methods to quantitatively measure metacognitive abilities in large language models, finding that frontier LLMs since early 2024 show increasing evidence of self-awareness capabilities. The study reveals these abilities are limited in resolution and qualitatively different from human metacognition, with variations across models suggesting post-training influences development.
AINeutralarXiv – CS AI · Mar 57/10
🧠Researchers introduce the Emotion-Gradient Metacognitive Recursive Self-Improvement (EG-MRSI) framework, a theoretical architecture for AI systems that can safely modify their own learning algorithms. The framework integrates metacognition, emotion-based motivation, and self-modification with formal safety constraints, representing foundational research toward safe artificial general intelligence.
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers propose Metacognitive Behavioral Tuning (MBT), a new framework that addresses structural fragility in Large Reasoning Models by injecting human-like self-regulatory control into AI thought processes. The approach reduces reasoning collapse and improves accuracy while consuming fewer computational tokens across multi-hop question-answering benchmarks.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers propose EGPO, a new framework that improves large reasoning models by incorporating uncertainty awareness into reinforcement learning training. The approach addresses the "uncertainty-reward mismatch" where current training methods treat high and low-confidence solutions equally, preventing models from developing better reasoning capabilities.
AINeutralarXiv – CS AI · May 276/10
🧠Researchers developed a reinforcement learning system that strategically controls when students can access generative AI tools during learning tasks. In a controlled study of 105 students, timed GenAI access outperformed both unrestricted use and complete restriction, improving test performance and metacognitive accuracy while reducing errors and task duration.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers establish formal mathematical bounds for when human-AI teams outperform individuals, proving complementarity occurs only when error correlation between humans and AI falls below a critical threshold. The framework explains why 70% of real-world human-AI collaborations fail to achieve synergy and provides predictive formulas validated against human datasets.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce the Metacognitive Probe, a diagnostic tool measuring five dimensions of LLM confidence behavior including calibration, epistemic vigilance, and reasoning validation. Testing on eight frontier models and 69 humans reveals significant within-model disparities—exemplified by Gemini 2.5 Flash scoring 88 on confidence calibration but only 41 on difficulty prediction—suggesting composite benchmarks mask pockets of overconfidence.
🧠 Gemini
AINeutralarXiv – CS AI · May 116/10
🧠Researchers evaluated metacognitive monitoring across 33 frontier LLMs using 47,151 MMLU benchmark items, finding significant domain-level variation masked by aggregate performance scores. Applied/Professional knowledge domains showed consistently strong self-monitoring (AUROC .742), while Formal Reasoning and Natural Science proved most challenging, with implications for targeted model deployment.
🏢 OpenAI🏢 Anthropic🧠 Gemini
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers investigated whether self-monitoring mechanisms (metacognition, self-prediction, duration estimation) improve reinforcement learning agents in predator-prey environments. Initial auxiliary-loss implementations provided no benefits, but structurally integrating these modules into decision pathways showed modest improvements, suggesting effective AI enhancement requires architectural embedding rather than add-on approaches.
AIBullisharXiv – CS AI · Mar 276/10
🧠Researchers developed a novel Co-Regulation Design Agentic Loop (CRDAL) system that uses metacognitive agents to improve AI-driven engineering design by reducing design fixation. The system showed better performance than traditional approaches in battery pack design tasks without significantly increasing computational costs.
AINeutralarXiv – CS AI · Mar 276/10
🧠Researchers introduce a new framework to evaluate how well Large Language Models understand their own knowledge limitations, finding that traditional confidence metrics miss key differences between models. The study reveals that models showing similar accuracy can have vastly different metacognitive abilities - their capacity to know what they don't know.
🧠 Llama
AINeutralarXiv – CS AI · Mar 116/10
🧠Research reveals that LLMs heavily concentrate their confidence scores on just three round numbers when using standard 0-100 scales, with over 78% of responses showing this pattern. The study demonstrates that using a 0-20 confidence scale significantly improves metacognitive efficiency compared to the conventional 0-100 format.