AINeutralarXiv – CS AI · 2d ago7/10
🧠Researchers present the Minimum Viable Evaluation Suite (MVES), a framework for systematically testing LLM applications, revealing that generic prompt improvements often fail to deliver consistent gains and can cause significant performance regressions. Testing on local models showed that adding generic rules to prompts degraded RAG citation compliance by up to 70%, underscoring the need for rigorous, task-specific evaluation before deployment.
🧠 Llama
AIBullisharXiv – CS AI · 5d ago7/10
🧠Researchers introduced ReclAIm, a multi-agent AI framework using large language models to automatically detect and correct performance degradation in medical imaging classification models. The system successfully restored models experiencing up to 40.6% performance decline to within 2% of baseline values through automated fine-tuning, demonstrating practical viability for maintaining AI reliability in clinical settings.
AIBullisharXiv – CS AI · Jun 27/10
🧠Researchers propose MAAD (Multi-Agent Architecture Design), a framework using orchestrated AI agents with external knowledge and hierarchical memory to automate software architecture design from requirements. The system outperforms existing approaches and demonstrates that advanced LLMs significantly improve architectural quality and validation efficiency.
🧠 GPT-5
AINeutralarXiv – CS AI · Jun 27/10
🧠DARPA's AI Cyber Challenge (AIxCC, 2023-2025) represents the largest competition to date for autonomous cyber reasoning systems powered by large language models, tasked with discovering and fixing vulnerabilities in real-world open-source software. This systematic analysis examines competition design, finalist architectures, and performance drivers, revealing both genuine technical advances and remaining limitations in autonomous cybersecurity systems.
AIBullisharXiv – CS AI · Jun 27/10
🧠Researchers introduce LASEV, an LLM-based multi-agent system that generates educational videos by decomposing production into specialized agents rather than relying on end-to-end video models. The system achieves 95% cost reduction and over one million videos daily while maintaining high quality through structured reasoning, semantic critique, and deterministic compilation.
AIBullishOpenAI News · May 287/10
🧠Endava leverages Codex to transform into an agentic organization, enabling AI-driven automation of software development workflows. The approach dramatically accelerates delivery timelines and compresses requirements analysis from weeks to mere hours, signaling a shift toward AI-augmented enterprise operations.
AIBullisharXiv – CS AI · May 277/10
🧠GENESIS is an AI framework that automates the research and development of 6G cellular networks by converting specifications and research into validated production code through over-the-air testing. The system addresses critical limitations of LLMs in radio access networks by combining AI agents with persistent knowledge management and real-world hardware validation rather than relying solely on simulations.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers introduce FinAgent-RAG, an advanced AI framework designed to answer complex financial questions by combining iterative retrieval, reasoning, and self-verification. The system achieves 76-78% accuracy on financial benchmarks while reducing computational costs by 41%, demonstrating practical viability for institutional financial analysis.
AINeutralarXiv – CS AI · Mar 56/10
🧠Researchers introduce BeliefSim, a framework that uses Large Language Models to simulate how different demographic groups are susceptible to misinformation based on their underlying beliefs. The system achieves up to 92% accuracy in predicting misinformation susceptibility by incorporating psychology-informed belief profiles.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers present ShipFinance.ai, an AI-powered system using large language models to streamline ship finance loan origination by automating document processing, information extraction, and workflow management across complex maritime lending. The system addresses growing complexity in the sector driven by environmental regulations and ESG reporting requirements, offering maritime finance professionals tools to manage increasingly sophisticated underwriting processes.
AINeutralarXiv – CS AI · 2d ago6/10
🧠TAROT is a new GNN-based framework that improves few-shot tabular learning by constructing task-adaptive semantic graphs from LLM-inferred feature relationships. The approach addresses privacy concerns of direct LLM tabular data processing while achieving state-of-the-art performance on few-shot benchmarks through intelligent graph refinement that filters LLM hallucinations.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers propose Causal Ensemble Agent (CEA), a framework that combines multiple causal discovery algorithms with LLM-guided expert reweighting to improve accuracy in identifying causal relationships from data. The approach addresses limitations of existing methods by dynamically weighting statistical insights and leveraging domain knowledge, demonstrating superior performance across synthetic and real-world datasets.
AINeutralarXiv – CS AI · 3d ago6/10
🧠A position paper argues that the machine learning community must develop an AI-augmented peer-review ecosystem to address the crisis of scale in scientific publishing. With manuscript submissions exponentially outpacing qualified reviewers at premier ML venues, the authors propose using LLMs as collaborators—not replacements—to enhance factual verification, reviewer performance, author quality improvement, and administrative decision-making while maintaining scientific integrity.
AIBullisharXiv – CS AI · 3d ago6/10
🧠MetaPlate is an AI-powered dietary decision-support system that combines counterfactual explanations, continuous glucose monitoring data, and large language models to generate personalized meal recommendations for preventing postprandial hyperglycemia. The system demonstrated improved clinical plausibility and actionability through expert validation with registered dietitians, showcasing how domain-specific constraints enhance LLM reliability in healthcare applications.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers developed an LLM-orchestrated framework that automates conformance checking in healthcare by extracting patient care pathways and clinical guidelines from unstructured text, eliminating the need for formal Computer-Interpretable Guidelines. Testing at Alessandria Hospital's neurological ward showed 86% of stroke care traces adhered to clinical guidelines, demonstrating practical feasibility of AI-driven healthcare compliance assessment.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers propose an optimized system for running vision-language models on UAVs in low-altitude networks, combining resource allocation algorithms with LLM-enhanced reinforcement learning to minimize latency and power consumption while maintaining inference accuracy. The framework addresses a critical challenge in aerial IoT applications where onboard computational constraints and dynamic network conditions limit real-time multimodal data processing.
AIBullisharXiv – CS AI · 4d ago6/10
🧠WhiteTesseract combines extended reality (XR) and conversational AI to enhance cultural heritage exhibitions by enabling personalized, context-aware interpretation of artworks while preserving the physical viewing experience. A controlled study at a Monet exhibition demonstrated that the system nearly tripled average viewing time (35.3 to 98.3 seconds) and prompted 60% of visitor-AI interactions to move beyond factual queries into analytical and emotional engagement.
🧠 Claude
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers develop a large language model framework for predicting vessel trajectories and destinations up to 30 days in advance using reinforcement learning with verifiable rewards. The approach outperforms traditional deep learning methods by maintaining route feasibility and destination accuracy over extended maritime forecasting horizons.
AINeutralarXiv – CS AI · 5d ago5/10
🧠Researchers conducted AI-assisted co-creation workshops with 10 elderly migrants in urban China, combining storytelling, large language models, and handcrafting to create new Hanzi characters that preserve personal narratives. The study demonstrates how AI can lower creative expression barriers for older adults with limited digital literacy while challenging stereotypes about aging populations.
AINeutralarXiv – CS AI · 5d ago5/10
🧠Researchers introduce a standardized taxonomy for classifying invalid bug reports and develop AI methods to automatically identify root causes and generate no-code fixes. Testing retrieval augmented generation, vanilla LLMs, and agentic web search, they achieve 66% weighted F1-score for subclassification and 68.9% success rate for fix generation, demonstrating significant potential for automating customer support workflows.
AIBullisharXiv – CS AI · Jun 56/10
🧠Researchers developed Binary Gaussian Copula Synthesis (BGCS), an LLM-augmented data augmentation method that addresses severe class imbalance in chronic kidney disease datasets to improve early dialysis prediction. Tested on 15,169 CKD patients, BGCS outperformed existing methods like SMOTE and CTGAN, achieving 78-87% minority-class recall and enabling deployment in interpretable clinical decision-support systems.
AINeutralarXiv – CS AI · Jun 46/10
🧠Researchers introduce HPRO, an LLM-based framework for sales lead scoring that combines structured CRM data with unstructured customer interactions using hierarchical preference ranking. A 132-day A/B test with a major NEV manufacturer showed 9.5% sales volume uplift and 39.7% precision improvement, demonstrating practical commercial viability beyond traditional machine learning approaches.
AINeutralarXiv – CS AI · Jun 26/10
🧠PropLLM is a novel AI system that diagnoses network faults by tracing propagation paths backward from symptomatic alerts using large language models combined with knowledge graphs. The approach achieves 3.9% improvement in fault diagnosis accuracy and reduces hallucinations by 50.8% compared to existing methods, with validation across Wi-Fi and 5G networks.
AINeutralarXiv – CS AI · Jun 25/10
🧠Researchers present an NLP framework that uses large language models and semantic matching to extract competencies from educational curricula and align them with labor-market demands. Applied to a UAE university's computer science program, the system identified significant gaps in general skills and algorithms while finding near-zero gaps in AI/data science, demonstrating a scalable approach to curriculum-labor market alignment.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers developed a hybrid framework combining structured clinical data with large language models to predict coronary artery disease, achieving 94.61% fidelity in converting patient records to natural language narratives. While traditional machine learning outperformed LLMs in accuracy, the study demonstrates that LLM-based classification offers significant privacy advantages by eliminating exposure of sensitive numerical patient data in clinical prediction systems.
🧠 Gemini