y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#human-in-the-loop News & Analysis

21 articles tagged with #human-in-the-loop. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

21 articles
AIBullisharXiv – CS AI · 1d ago7/10
🧠

Syll: Open-Source Personal Automation with Cross-Surface Execution

Syll is an open-source, self-hosted AI agent framework that enables personal automation across multiple interfaces—APIs, CLIs, web browsers, and desktop applications. The system allows users to teach agents through direct demonstration, compiling actions into reusable skills while maintaining transparency through multimodal logging and local artifact storage for inspection and control.

AINeutralarXiv – CS AI · 2d ago7/10
🧠

Measuring Agents in Production

A comprehensive study of deployed LLM-based agents across 26 domains reveals that production systems rely on simple, human-centered approaches rather than complex automation. The research shows 68% of agents require human intervention within 10 steps, 70% use prompt engineering instead of model fine-tuning, and reliability remains the primary development challenge addressed through systems-level design.

AINeutralarXiv – CS AI · May 297/10
🧠

NOVA: Fundamental Limits of Knowledge Discovery Through AI

Researchers introduce the NOVA framework, which models AI knowledge discovery as an adaptive sampling process and identifies fundamental scaling limitations. The analysis reveals a contamination trap where false positives accumulate faster than genuine discoveries as knowledge becomes scarce, with cumulative generation costs following a Zipf-distributed scaling law demonstrating asymptotic diminishing returns.

AIBearisharXiv – CS AI · May 297/10
🧠

Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence

Researchers present an empirical study revealing that Large Language Models struggle with cyber threat intelligence (CTI) tasks due to domain-specific vulnerabilities rather than generic AI failures. The study identifies three failure modes—spurious correlations, contradictory knowledge, and constrained generalization—and proposes targeted defenses to improve LLM reliability in security operations.

AIBullisharXiv – CS AI · 9h ago6/10
🧠

Flow Control: Steering Vision-Language-Action Models with Simple Real-Time Inputs

Researchers introduce flow control, a technique that enables real-time steering of vision-language-action (VLA) models through simple user inputs like keyboards without requiring model retraining. The method allows users to guide robot actions toward their intent while maintaining high-quality outputs aligned with the model's learned expert distribution, improving task success rates and completion times.

AINeutralarXiv – CS AI · Jun 36/10
🧠

DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration

Researchers introduced DeskCraft, a new benchmark for evaluating AI desktop agents on complex, long-horizon professional workflows in creative and engineering software. The study reveals significant performance gaps, with GPT-4 achieving only 31.6% accuracy on standard tasks and 27.6% on interactive tasks requiring human collaboration, highlighting challenges in multi-step automation and proactive agent communication.

🧠 GPT-5
AINeutralarXiv – CS AI · May 286/10
🧠

An LLM-Based Assistance System for Intuitive and Flexible Capability-Based Planning

Researchers developed a hybrid system combining formal symbolic planning with large language models to improve capability-based planning in industrial automation. The system integrates natural-language interaction, explainability, and human-approved knowledge model adaptation, achieving high accuracy across planning and query tasks while maintaining formal correctness guarantees.

AINeutralarXiv – CS AI · May 286/10
🧠

SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping

SmartIterator is a visual analytics framework that helps data scientists systematically evaluate and choose between multiple unsupervised learning results across parameter sweeps. The approach operationalizes structured six-phase workflows for three clustering and topic-modeling method families, enabling informed decision-making by visualizing data grouping quality, stability, membership confidence, and domain context simultaneously.

AINeutralarXiv – CS AI · May 276/10
🧠

Generating Robust Portfolios of Optimization Models using Large Language Models

Researchers propose an algorithm that uses large language models to generate portfolios of optimization models rather than single outputs, addressing the reliability gap in LLM-generated solutions. The method leverages LLMs in dual roles—as generative and evaluative components—with theoretical guarantees that high-quality candidates appear in the portfolio as long as either role aligns with human preferences.

$MKR
AIBullisharXiv – CS AI · May 116/10
🧠

From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

Researchers have developed an AI Teaching & Learning Assistant, a Moodle plugin using Retrieval-Augmented Generation (RAG) to provide students with Socratic tutoring while enabling educators to supervise content generation. The system grounds LLM responses in teacher-provided materials to minimize hallucinations and misinformation, achieving high faithfulness scores (0.97) and strong user satisfaction (4.00/5.00 rating).

AINeutralarXiv – CS AI · May 46/10
🧠

Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

Researchers propose a trust framework for AI agent skills—reusable code packages that extend language models—treating them as untrusted by default until verified. The approach introduces verification levels, capability gates, and correctness criteria to enable sustainable human-in-the-loop oversight without operational bottlenecks.

AINeutralarXiv – CS AI · May 16/10
🧠

Modeling Clinical Concern Trajectories in Language Model Agents

Researchers introduce a lightweight LLM agent architecture that uses first- and second-order state dynamics to model gradual clinical concern escalation rather than abrupt threshold-based responses. The approach makes AI decision-making more transparent by revealing sustained risk signals before escalation, enabling better human oversight in clinical settings.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems

Researchers propose a reactor-model-of-computation approach using the Lingua Franca framework to address nondeterminism challenges in AI-powered human-in-the-loop cyber-physical systems. The study uses an agentic driving coach as a case study to demonstrate how foundation models like LLMs can be deployed in safety-critical applications while maintaining deterministic behavior despite unpredictable human and environmental variables.

AIBullisharXiv – CS AI · Mar 166/10
🧠

Human-in-the-Loop LLM Grading for Handwritten Mathematics Assessments

Researchers developed a human-in-the-loop LLM system for grading handwritten mathematics assessments that reduces grading time by 23% while maintaining accuracy comparable to manual grading. The system combines automated scanning, multi-pass LLM scoring, consistency checks, and mandatory human verification to handle pen-and-paper tests at scale.

AIBullisharXiv – CS AI · Mar 96/10
🧠

PONTE: Personalized Orchestration for Natural Language Trustworthy Explanations

Researchers introduce PONTE, a human-in-the-loop framework that creates personalized, trustworthy AI explanations by combining user preference modeling with verification modules. The system addresses the challenge of one-size-fits-all AI explanations by adapting to individual user expertise and cognitive needs while maintaining faithfulness and reducing hallucinations.

AIBullisharXiv – CS AI · Feb 276/107
🧠

Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots

Researchers developed a framework for analyzing AI diagnostic systems in clinical settings by preserving original AI inferences and comparing them with physician corrections. The study of 21 dermatological cases showed 71.4% exact agreement between AI and physicians, with 100% comprehensive concordance when using structured analysis methods.

AINeutralarXiv – CS AI · Apr 135/10
🧠

MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator

MuTSE is an interactive web application designed to evaluate Large Language Model outputs for text simplification tasks across multiple prompting strategies and proficiency levels. The tool addresses a methodological gap in NLP research by providing researchers and educators with a structured, visual framework for comparing prompt-model combinations in real-time.

AIBullisharXiv – CS AI · Apr 74/10
🧠

CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation

Researchers developed CODE-GEN, a human-in-the-loop AI system that uses retrieval-augmented generation to create multiple-choice programming questions for educational purposes. The system achieved 79.9% to 98.6% success rates across seven pedagogical dimensions when evaluated by subject-matter experts, demonstrating strong performance in computational verification tasks while still requiring human expertise for complex instructional design.

AINeutralarXiv – CS AI · Mar 125/10
🧠

Context Over Compute Human-in-the-Loop Outperforms Iterative Chain-of-Thought Prompting in Interview Answer Quality

Research comparing human-in-the-loop versus automated chain-of-thought prompting for behavioral interview evaluation found that human involvement significantly outperforms automated methods. The human approach required 5x fewer iterations, achieved 100% success rate versus 84% for automated methods, and showed substantial improvements in confidence and authenticity scores.