#llm-applications News & Analysis

54 articles tagged with #llm-applications. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

54 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

VideoAgent: All-in-One Framework for Video Understanding and Editing

VideoAgent is an AI framework that automates video understanding and editing at scale, handling complex multi-step editing tasks through a multi-agent orchestration system. The system achieves 87-95% success rates while reducing costs by 60%, with human evaluations showing output quality only 4% below professional human-created videos.

AINeutralarXiv – CS AI · Jun 117/10

🧠

When Generic Prompt Improvements Hurt: Evaluation-Driven Iteration for LLM Applications

Researchers present the Minimum Viable Evaluation Suite (MVES), a framework for systematically testing LLM applications, revealing that generic prompt improvements often fail to deliver consistent gains and can cause significant performance regressions. Testing on local models showed that adding generic rules to prompts degraded RAG citation compliance by up to 70%, underscoring the need for rigorous, task-specific evaluation before deployment.

🧠 Llama

AIBullisharXiv – CS AI · Jun 87/10

🧠

ReclAIm: A Multi-Agent Framework for Monitoring and Correcting Performance Decline in Medical Imaging AI

Researchers introduced ReclAIm, a multi-agent AI framework using large language models to automatically detect and correct performance degradation in medical imaging classification models. The system successfully restored models experiencing up to 40.6% performance decline to within 2% of baseline values through automated fine-tuning, demonstrating practical viability for maintaining AI reliability in clinical settings.

AINeutralarXiv – CS AI · Jun 27/10

🧠

SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

DARPA's AI Cyber Challenge (AIxCC, 2023-2025) represents the largest competition to date for autonomous cyber reasoning systems powered by large language models, tasked with discovering and fixing vulnerabilities in real-world open-source software. This systematic analysis examines competition design, finalist architectures, and performance drivers, revealing both genuine technical advances and remaining limitations in autonomous cybersecurity systems.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation

Researchers introduce LASEV, an LLM-based multi-agent system that generates educational videos by decomposing production into specialized agents rather than relying on end-to-end video models. The system achieves 95% cost reduction and over one million videos daily while maintaining high quality through structured reasoning, semantic critique, and deterministic compilation.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Bridging Requirements and Architecture: Multi-Agent Orchestration with External Knowledge and Hierarchical Memory

Researchers propose MAAD (Multi-Agent Architecture Design), a framework using orchestrated AI agents with external knowledge and hierarchical memory to automate software architecture design from requirements. The system outperforms existing approaches and demonstrates that advanced LLMs significantly improve architectural quality and validation efficiency.

🧠 GPT-5

AIBullishOpenAI News · May 287/10

🧠

How Endava builds an agentic organization with Codex

Endava leverages Codex to transform into an agentic organization, enabling AI-driven automation of software development workflows. The approach dramatically accelerates delivery timelines and compresses requirements analysis from weeks to mere hours, signaling a shift toward AI-augmented enterprise operations.

AIBullisharXiv – CS AI · May 277/10

🧠

GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing

GENESIS is an AI framework that automates the research and development of 6G cellular networks by converting specifications and research into validated production code through over-the-air testing. The system addresses critical limitations of LLMs in radio access networks by combining AI agents with persistent knowledge management and real-world hardware validation rather than relying solely on simulations.

AIBullisharXiv – CS AI · May 97/10

🧠

Agentic Retrieval-Augmented Generation for Financial Document Question Answering

Researchers introduce FinAgent-RAG, an advanced AI framework designed to answer complex financial questions by combining iterative retrieval, reasoning, and self-verification. The system achieves 76-78% accuracy on financial benchmarks while reducing computational costs by 41%, demonstrating practical viability for institutional financial analysis.

AINeutralarXiv – CS AI · Mar 56/10

🧠

Belief-Sim: Towards Belief-Driven Simulation of Demographic Misinformation Susceptibility

Researchers introduce BeliefSim, a framework that uses Large Language Models to simulate how different demographic groups are susceptible to misinformation based on their underlying beliefs. The system achieves up to 92% accuracy in predicting misinformation susceptibility by incorporating psychology-informed belief profiles.

AINeutralarXiv – CS AI · Jun 256/10

🧠

LLM-ACES: Closed-Loop Discovery of Dynamical Systems with LLM-Guided Adaptive Search

Researchers introduce LLM-ACES, a framework combining large language models with active learning to discover governing equations of dynamical systems from data. The approach achieves significant improvements in accuracy and sample efficiency by using LLM-proposed hypotheses to guide strategic data acquisition, outperforming existing methods on 122 ODE systems while requiring substantially less training data.

AINeutralarXiv – CS AI · Jun 256/10

🧠

ReviewGuard: Aligning LLM-Assisted Peer Review with Long-Term Scientific Impact

Researchers introduce ReviewGuard, an LLM-based framework that predicts long-term scientific impact rather than mimicking human peer reviewers. Testing on 20,861 AI/ML papers shows ReviewGuard correlates 5.6x better with future citations than human reviewers and identifies high-impact rejected papers at significantly higher rates, suggesting AI can complement editorial decision-making without replacing human judgment.

AIBullisharXiv – CS AI · Jun 256/10

🧠

AI-Assisted Computational Reproducibility on the FABRIC Testbed

Researchers demonstrate that combining the FABRIC testbed with LLM-based coding assistants can significantly reduce the effort required to reproduce published scientific experiments. The AI-assisted approach achieved 4-6x reduction in reproduction effort across three case studies, though human intervention remained necessary for complex analytical workflows.

AIBullisharXiv – CS AI · Jun 236/10

🧠

From Empirical Evaluation to Context-Aware Enhancement: Repairing Regression Errors with LLMs

Researchers introduce RegressionBug4APR, a benchmark of 200 real-world Java and Python regression bugs, to evaluate automated program repair (APR) techniques. The study finds that traditional APR tools fail entirely on regression bugs, while LLM-based approaches show promise, achieving 1.6x better results when enhanced with bug-inducing change context.

AINeutralarXiv – CS AI · Jun 236/10

🧠

ThermoLLM: Thermodynamics-Aware HVAC Control with Spatial-Semantic Knowledge Graph

Researchers present ThermoLLM, a Large Language Model-based framework for multi-zone HVAC control that integrates thermodynamic physics and spatial building semantics through a knowledge graph. The system outperforms standard baselines and competing LLM approaches by reasoning about zone coupling and thermal interactions, achieving superior energy-comfort trade-offs in building simulations.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Tell Me: An LLM-powered Mental Well-being Assistant with RAG, Synthetic Dialogue Generation, and Agentic Planning

Researchers have developed Tell Me, an LLM-powered mental health support system that combines retrieval-augmented generation for personalized dialogue, synthetic therapist-client conversation generation for research purposes, and an agentic AI crew for creating adaptive self-care plans. The system demonstrates how large language models can expand access to mental well-being resources while maintaining clear boundaries that it complements rather than replaces professional therapy.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Artificial Intelligence in Ship Finance: Applications, Opportunities, and a Case Study in AI-Augmented Loan Origination

Researchers present ShipFinance.ai, an AI-powered system using large language models to streamline ship finance loan origination by automating document processing, information extraction, and workflow management across complex maritime lending. The system addresses growing complexity in the sector driven by environmental regulations and ESG reporting requirements, offering maritime finance professionals tools to manage increasingly sophisticated underwriting processes.

AINeutralarXiv – CS AI · Jun 116/10

🧠

TAROT: Task-Adaptive Refinement of LLM-prior Graphs for Few-shot Tabular Learning

TAROT is a new GNN-based framework that improves few-shot tabular learning by constructing task-adaptive semantic graphs from LLM-inferred feature relationships. The approach addresses privacy concerns of direct LLM tabular data processing while achieving state-of-the-art performance on few-shot benchmarks through intelligent graph refinement that filters LLM hallucinations.

AIBullisharXiv – CS AI · Jun 106/10

🧠

MetaPlate: Counterfactual-Guided RAG-LLM Tool for Personalized Food Recommendation and Hyperglycemia Prevention

MetaPlate is an AI-powered dietary decision-support system that combines counterfactual explanations, continuous glucose monitoring data, and large language models to generate personalized meal recommendations for preventing postprandial hyperglycemia. The system demonstrated improved clinical plausibility and actionability through expert validation with registered dietitians, showcasing how domain-specific constraints enhance LLM reliability in healthcare applications.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Causal Ensemble Agent: Hierarchical Causal Discovery with LLM-guided Expert Reweighting

Researchers propose Causal Ensemble Agent (CEA), a framework that combines multiple causal discovery algorithms with LLM-guided expert reweighting to improve accuracy in identifying causal relationships from data. The approach addresses limitations of existing methods by dynamically weighting statistical insights and leveraging domain knowledge, demonstrating superior performance across synthetic and real-world datasets.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Position: The ML Community Must Build an AI-Augmented Peer-Review Ecosystem

A position paper argues that the machine learning community must develop an AI-augmented peer-review ecosystem to address the crisis of scale in scientific publishing. With manuscript submissions exponentially outpacing qualified reviewers at premier ML venues, the authors propose using LLMs as collaborators—not replacements—to enhance factual verification, reviewer performance, author quality improvement, and administrative decision-making while maintaining scientific integrity.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

Researchers propose an optimized system for running vision-language models on UAVs in low-altitude networks, combining resource allocation algorithms with LLM-enhanced reinforcement learning to minimize latency and power consumption while maintaining inference accuracy. The framework addresses a critical challenge in aerial IoT applications where onboard computational constraints and dynamic network conditions limit real-time multimodal data processing.

AIBullisharXiv – CS AI · Jun 96/10

🧠

WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AI

WhiteTesseract combines extended reality (XR) and conversational AI to enhance cultural heritage exhibitions by enabling personalized, context-aware interpretation of artworks while preserving the physical viewing experience. A controlled study at a Monet exhibition demonstrated that the system nearly tripled average viewing time (35.3 to 98.3 seconds) and prompted 60% of visitor-AI interactions to move beyond factual queries into analytical and emotional engagement.

🧠 Claude

AINeutralarXiv – CS AI · Jun 96/10

🧠

Towards Long-Horizon Vessel Trajectory and Destination Forecasting with Reasoning Large Language Models

Researchers develop a large language model framework for predicting vessel trajectories and destinations up to 30 days in advance using reinforcement learning with verifiable rewards. The approach outperforms traditional deep learning methods by maintaining route feasibility and destination accuracy over extended maritime forecasting horizons.

AINeutralarXiv – CS AI · Jun 96/10

🧠

LLM-Orchestrated Conformance Checking in Stroke Care Without Computer-Interpretable Guidelines

Researchers developed an LLM-orchestrated framework that automates conformance checking in healthcare by extracting patient care pathways and clinical guidelines from unstructured text, eliminating the need for formal Computer-Interpretable Guidelines. Testing at Alessandria Hospital's neurological ward showed 86% of stroke care traces adhered to clinical guidelines, demonstrating practical feasibility of AI-driven healthcare compliance assessment.

Page 1 of 3Next →