y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm-applications News & Analysis

47 articles tagged with #llm-applications. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

47 articles
AINeutralarXiv – CS AI · 2d ago7/10
🧠

When Generic Prompt Improvements Hurt: Evaluation-Driven Iteration for LLM Applications

Researchers present the Minimum Viable Evaluation Suite (MVES), a framework for systematically testing LLM applications, revealing that generic prompt improvements often fail to deliver consistent gains and can cause significant performance regressions. Testing on local models showed that adding generic rules to prompts degraded RAG citation compliance by up to 70%, underscoring the need for rigorous, task-specific evaluation before deployment.

🧠 Llama
AIBullisharXiv – CS AI · 5d ago7/10
🧠

ReclAIm: A Multi-Agent Framework for Monitoring and Correcting Performance Decline in Medical Imaging AI

Researchers introduced ReclAIm, a multi-agent AI framework using large language models to automatically detect and correct performance degradation in medical imaging classification models. The system successfully restored models experiencing up to 40.6% performance decline to within 2% of baseline values through automated fine-tuning, demonstrating practical viability for maintaining AI reliability in clinical settings.

AIBullisharXiv – CS AI · Jun 27/10
🧠

Bridging Requirements and Architecture: Multi-Agent Orchestration with External Knowledge and Hierarchical Memory

Researchers propose MAAD (Multi-Agent Architecture Design), a framework using orchestrated AI agents with external knowledge and hierarchical memory to automate software architecture design from requirements. The system outperforms existing approaches and demonstrates that advanced LLMs significantly improve architectural quality and validation efficiency.

🧠 GPT-5
AINeutralarXiv – CS AI · Jun 27/10
🧠

SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

DARPA's AI Cyber Challenge (AIxCC, 2023-2025) represents the largest competition to date for autonomous cyber reasoning systems powered by large language models, tasked with discovering and fixing vulnerabilities in real-world open-source software. This systematic analysis examines competition design, finalist architectures, and performance drivers, revealing both genuine technical advances and remaining limitations in autonomous cybersecurity systems.

AIBullisharXiv – CS AI · Jun 27/10
🧠

Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation

Researchers introduce LASEV, an LLM-based multi-agent system that generates educational videos by decomposing production into specialized agents rather than relying on end-to-end video models. The system achieves 95% cost reduction and over one million videos daily while maintaining high quality through structured reasoning, semantic critique, and deterministic compilation.

AIBullishOpenAI News · May 287/10
🧠

How Endava builds an agentic organization with Codex

Endava leverages Codex to transform into an agentic organization, enabling AI-driven automation of software development workflows. The approach dramatically accelerates delivery timelines and compresses requirements analysis from weeks to mere hours, signaling a shift toward AI-augmented enterprise operations.

AIBullisharXiv – CS AI · May 277/10
🧠

GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing

GENESIS is an AI framework that automates the research and development of 6G cellular networks by converting specifications and research into validated production code through over-the-air testing. The system addresses critical limitations of LLMs in radio access networks by combining AI agents with persistent knowledge management and real-world hardware validation rather than relying solely on simulations.

AIBullisharXiv – CS AI · May 97/10
🧠

Agentic Retrieval-Augmented Generation for Financial Document Question Answering

Researchers introduce FinAgent-RAG, an advanced AI framework designed to answer complex financial questions by combining iterative retrieval, reasoning, and self-verification. The system achieves 76-78% accuracy on financial benchmarks while reducing computational costs by 41%, demonstrating practical viability for institutional financial analysis.

AINeutralarXiv – CS AI · Mar 56/10
🧠

Belief-Sim: Towards Belief-Driven Simulation of Demographic Misinformation Susceptibility

Researchers introduce BeliefSim, a framework that uses Large Language Models to simulate how different demographic groups are susceptible to misinformation based on their underlying beliefs. The system achieves up to 92% accuracy in predicting misinformation susceptibility by incorporating psychology-informed belief profiles.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Artificial Intelligence in Ship Finance: Applications, Opportunities, and a Case Study in AI-Augmented Loan Origination

Researchers present ShipFinance.ai, an AI-powered system using large language models to streamline ship finance loan origination by automating document processing, information extraction, and workflow management across complex maritime lending. The system addresses growing complexity in the sector driven by environmental regulations and ESG reporting requirements, offering maritime finance professionals tools to manage increasingly sophisticated underwriting processes.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

TAROT: Task-Adaptive Refinement of LLM-prior Graphs for Few-shot Tabular Learning

TAROT is a new GNN-based framework that improves few-shot tabular learning by constructing task-adaptive semantic graphs from LLM-inferred feature relationships. The approach addresses privacy concerns of direct LLM tabular data processing while achieving state-of-the-art performance on few-shot benchmarks through intelligent graph refinement that filters LLM hallucinations.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

Causal Ensemble Agent: Hierarchical Causal Discovery with LLM-guided Expert Reweighting

Researchers propose Causal Ensemble Agent (CEA), a framework that combines multiple causal discovery algorithms with LLM-guided expert reweighting to improve accuracy in identifying causal relationships from data. The approach addresses limitations of existing methods by dynamically weighting statistical insights and leveraging domain knowledge, demonstrating superior performance across synthetic and real-world datasets.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

Position: The ML Community Must Build an AI-Augmented Peer-Review Ecosystem

A position paper argues that the machine learning community must develop an AI-augmented peer-review ecosystem to address the crisis of scale in scientific publishing. With manuscript submissions exponentially outpacing qualified reviewers at premier ML venues, the authors propose using LLMs as collaborators—not replacements—to enhance factual verification, reviewer performance, author quality improvement, and administrative decision-making while maintaining scientific integrity.

AIBullisharXiv – CS AI · 3d ago6/10
🧠

MetaPlate: Counterfactual-Guided RAG-LLM Tool for Personalized Food Recommendation and Hyperglycemia Prevention

MetaPlate is an AI-powered dietary decision-support system that combines counterfactual explanations, continuous glucose monitoring data, and large language models to generate personalized meal recommendations for preventing postprandial hyperglycemia. The system demonstrated improved clinical plausibility and actionability through expert validation with registered dietitians, showcasing how domain-specific constraints enhance LLM reliability in healthcare applications.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

LLM-Orchestrated Conformance Checking in Stroke Care Without Computer-Interpretable Guidelines

Researchers developed an LLM-orchestrated framework that automates conformance checking in healthcare by extracting patient care pathways and clinical guidelines from unstructured text, eliminating the need for formal Computer-Interpretable Guidelines. Testing at Alessandria Hospital's neurological ward showed 86% of stroke care traces adhered to clinical guidelines, demonstrating practical feasibility of AI-driven healthcare compliance assessment.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

Researchers propose an optimized system for running vision-language models on UAVs in low-altitude networks, combining resource allocation algorithms with LLM-enhanced reinforcement learning to minimize latency and power consumption while maintaining inference accuracy. The framework addresses a critical challenge in aerial IoT applications where onboard computational constraints and dynamic network conditions limit real-time multimodal data processing.

AIBullisharXiv – CS AI · 4d ago6/10
🧠

WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AI

WhiteTesseract combines extended reality (XR) and conversational AI to enhance cultural heritage exhibitions by enabling personalized, context-aware interpretation of artworks while preserving the physical viewing experience. A controlled study at a Monet exhibition demonstrated that the system nearly tripled average viewing time (35.3 to 98.3 seconds) and prompted 60% of visitor-AI interactions to move beyond factual queries into analytical and emotional engagement.

🧠 Claude
AINeutralarXiv – CS AI · 5d ago5/10
🧠

Telling stories, making Hanzi: AI-assisted co-creation with elderly migrants in urban China

Researchers conducted AI-assisted co-creation workshops with 10 elderly migrants in urban China, combining storytelling, large language models, and handcrafting to create new Hanzi characters that preserve personal narratives. The study demonstrates how AI can lower creative expression barriers for older adults with limited digital literacy while challenging stereotypes about aging populations.

AINeutralarXiv – CS AI · 5d ago5/10
🧠

Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports

Researchers introduce a standardized taxonomy for classifying invalid bug reports and develop AI methods to automatically identify root causes and generate no-code fixes. Testing retrieval augmented generation, vanilla LLMs, and agentic web search, they achieve 66% weighted F1-score for subclassification and 68.9% success rate for fix generation, demonstrating significant potential for automating customer support workflows.

AIBullisharXiv – CS AI · Jun 56/10
🧠

Binary Gaussian Copula Synthesis: an LLM-powered data augmentation framework for early dialysis prediction in chronic kidney disease

Researchers developed Binary Gaussian Copula Synthesis (BGCS), an LLM-augmented data augmentation method that addresses severe class imbalance in chronic kidney disease datasets to improve early dialysis prediction. Tested on 15,169 CKD patients, BGCS outperformed existing methods like SMOTE and CTGAN, achieving 78-87% minority-class recall and enabling deployment in interpretable clinical decision-support systems.

AINeutralarXiv – CS AI · Jun 46/10
🧠

Rethinking Sales Lead Scoring with LLM-based Hierarchical Preference Ranking

Researchers introduce HPRO, an LLM-based framework for sales lead scoring that combines structured CRM data with unstructured customer interactions using hierarchical preference ranking. A 132-day A/B test with a major NEV manufacturer showed 9.5% sales volume uplift and 39.7% precision improvement, demonstrating practical commercial viability beyond traditional machine learning approaches.

AINeutralarXiv – CS AI · Jun 26/10
🧠

PropLLM: Propagation-Aware Scene Reconstruction for Network Fault Diagnosis

PropLLM is a novel AI system that diagnoses network faults by tracing propagation paths backward from symptomatic alerts using large language models combined with knowledge graphs. The approach achieves 3.9% improvement in fault diagnosis accuracy and reduces hallucinations by 50.8% compared to existing methods, with validation across Wi-Fi and 5G networks.

AINeutralarXiv – CS AI · Jun 25/10
🧠

An NLP-Driven Framework for Curriculum-Labor Market Alignment: Schema-Constrained LLM Extraction, ESCO-Anchored Semantic Matching, and Multi-Dimensional Gap Quantification

Researchers present an NLP framework that uses large language models and semantic matching to extract competencies from educational curricula and align them with labor-market demands. Applied to a UAE university's computer science program, the system identified significant gaps in general skills and algorithms while finding near-zero gaps in AI/data science, demonstrating a scalable approach to curriculum-labor market alignment.

AINeutralarXiv – CS AI · Jun 26/10
🧠

LLMs for Cardiovascular Risk Prediction from Structured Clinical Data

Researchers developed a hybrid framework combining structured clinical data with large language models to predict coronary artery disease, achieving 94.61% fidelity in converting patient records to natural language narratives. While traditional machine learning outperformed LLMs in accuracy, the study demonstrates that LLM-based classification offers significant privacy advantages by eliminating exposure of sensitive numerical patient data in clinical prediction systems.

🧠 Gemini
Page 1 of 2Next →