88 articles tagged with #natural-language-processing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv โ CS AI ยท Mar 126/10
๐ง Researchers introduce SpreadsheetArena, a platform for evaluating large language models' ability to generate spreadsheet workbooks from natural language prompts. The study reveals that preferred spreadsheet features vary significantly across use cases, and even top-performing models struggle with domain-specific best practices in areas like finance.
AINeutralarXiv โ CS AI ยท Mar 126/10
๐ง Researchers propose new uncertainty elicitation techniques for large language models using imprecise probabilities framework to better capture higher-order uncertainty. The approach addresses systematic failures in ambiguous question-answering and self-reflection by quantifying both first-order uncertainty over responses and second-order uncertainty about the probability model itself.
AIBullisharXiv โ CS AI ยท Mar 96/10
๐ง Researchers introduce CBR-to-SQL, a new framework using Case-Based Reasoning to improve natural language-to-SQL translation for healthcare databases. The system addresses limitations of standard RAG approaches by using two-stage retrieval and abstract case templates, achieving state-of-the-art results on medical datasets.
AIBullisharXiv โ CS AI ยท Mar 96/10
๐ง Researchers have developed MASFactory, a new graph-centric framework for orchestrating Large Language Model-based Multi-Agent Systems (MAS). The framework introduces 'Vibe Graphing,' which allows users to compile natural language instructions into executable workflow graphs, making complex AI agent coordination more accessible and reusable.
AIBullisharXiv โ CS AI ยท Mar 66/10
๐ง Researchers propose Adaptive Memory Admission Control (A-MAC), a new framework for managing long-term memory in LLM-based agents. The system improves memory precision-recall by 31% while reducing latency through structured decision-making based on five interpretable factors rather than opaque LLM-driven policies.
AINeutralarXiv โ CS AI ยท Mar 66/10
๐ง Researchers introduce ICR (Inductive Conceptual Rating), a new qualitative metric for evaluating meaning in large language model text summaries that goes beyond simple word similarity. The study found that while LLMs achieve high linguistic similarity to human outputs, they significantly underperform in semantic accuracy and capturing contextual meanings.
AIBullisharXiv โ CS AI ยท Mar 66/10
๐ง Researchers introduce the What Is Missing (WIM) rating system for Large Language Models that uses natural-language feedback instead of numerical ratings to improve preference learning. WIM computes ratings by analyzing cosine similarity between model outputs and judge feedback embeddings, producing more interpretable and effective training signals with fewer ties than traditional rating methods.
AINeutralarXiv โ CS AI ยท Mar 45/103
๐ง Researchers propose a new framework for handling ambiguity in natural language queries for tabular data analysis, reframing ambiguity as a cooperative feature rather than a deficiency. The study analyzes 15 datasets and finds that current evaluation methods inadequately assess both system accuracy and interpretation capabilities.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers have developed Thoth, the first family of Large Language Models specifically designed to understand and reason about time series data through a mid-training approach. The model uses a specialized corpus called Book-of-Thoth to bridge the gap between temporal data and natural language, significantly outperforming existing LLMs in time series analysis tasks.
AIBullisharXiv โ CS AI ยท Mar 36/106
๐ง Researchers introduce TripleSumm, a novel AI architecture that adaptively fuses visual, text, and audio modalities for improved video summarization. The team also releases MoSu, the first large-scale benchmark dataset providing all three modalities for multimodal video summarization research.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers have developed RGLM, a new approach to improve how large language models understand and process graph data by incorporating explicit graph supervision alongside text instructions. The method addresses limitations in existing Graph-Tokenizing LLMs that rely too heavily on text supervision, leading to underutilization of graph context.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers have developed the Cognitive Prosthetic Multimodal System (CPMS), an AI-enabled proof-of-concept that helps knowledge workers recall workplace experiences by capturing speech, physiological signals, and gaze behavior into queryable episodic memories. The system processes data locally for privacy and allows natural language queries to retrieve past workplace interactions based on semantic content, time, attention, or physiological state.
AINeutralarXiv โ CS AI ยท Mar 36/104
๐ง Researchers introduced EHR-ChatQA, a new benchmark for testing AI agents that interact with Electronic Health Record databases through natural language queries. The benchmark reveals significant reliability gaps in current state-of-the-art LLMs, with success rates dropping substantially when consistency across multiple trials is required.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduce WavefrontDiffusion, a new dynamic decoding approach for Diffusion Language Models that improves text generation quality by expanding from finalized positions rather than using fixed blocks. The method achieves state-of-the-art performance on reasoning and code generation benchmarks while maintaining computational efficiency equivalent to existing block-based methods.
AIBearisharXiv โ CS AI ยท Mar 36/104
๐ง A comprehensive study of 17 Large Language Models as automated annotators for Bangla hate speech detection reveals significant bias and instability issues. The research found that larger models don't necessarily perform better than smaller, task-specific ones, raising concerns about LLM reliability for sensitive annotation tasks in low-resource languages.
AIBullisharXiv โ CS AI ยท Mar 26/1014
๐ง Researchers have developed SleepLM, a family of AI foundation models that combine natural language processing with sleep analysis using polysomnography data. The system can interpret and describe sleep patterns in natural language, trained on over 100K hours of sleep data from 10,000+ individuals, enabling new capabilities like language-guided sleep event detection and zero-shot generalization to novel sleep analysis tasks.
AIBullisharXiv โ CS AI ยท Mar 27/1012
๐ง Researchers have introduced Hello-Chat, an end-to-end audio language model designed to create more realistic and emotionally resonant AI conversations. The model addresses the robotic nature of existing Large Audio Language Models by using real-life conversation data and achieving breakthrough performance in prosodic naturalness and emotional alignment.
AIBullisharXiv โ CS AI ยท Mar 26/1010
๐ง Researchers developed the TREC 2025 DRAGUN Track to evaluate AI systems that help readers assess news trustworthiness through automated report generation. The initiative created reusable evaluation resources including human-assessed rubrics and an AutoJudge system that correlates well with human evaluations for RAG-based news analysis tools.
AIBullisharXiv โ CS AI ยท Feb 276/104
๐ง Researchers introduce SOTAlign, a new framework for aligning vision and language AI models using minimal supervised data. The method uses optimal transport theory to achieve better alignment with significantly less paired training data than traditional approaches.
AINeutralarXiv โ CS AI ยท Feb 276/106
๐ง Researchers propose KGT, a novel framework that bridges the gap between Large Language Models and Knowledge Graph Completion by using dedicated entity tokens for full-space prediction. The approach addresses fundamental granularity mismatches through specialized tokenization, feature fusion, and decoupled prediction mechanisms.
AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง Researchers developed a three-stage framework using Small Language Models (SLMs) to automatically translate natural language queries into Kusto Query Language (KQL) for cybersecurity operations. The approach achieves high accuracy (98.7% syntax, 90.6% semantic) while reducing costs by up to 10x compared to GPT-4, potentially solving bottlenecks in Security Operations Centers.
AIBullisharXiv โ CS AI ยท Feb 276/105
๐ง Researchers introduce MoDora, an AI-powered system that uses tree-based analysis to understand and answer questions about semi-structured documents containing mixed data elements like tables, charts, and text. The system addresses challenges in processing fragmented OCR data and hierarchical document structures, achieving 5.97%-61.07% accuracy improvements over existing baselines.
AINeutralarXiv โ CS AI ยท Feb 276/107
๐ง Researchers introduce SPARTA, an automated framework for generating large-scale Table-Text question answering benchmarks that require complex multi-hop reasoning across structured and unstructured data. The benchmark exposes significant weaknesses in current AI models, with state-of-the-art systems experiencing over 30 F1 point performance drops compared to existing simpler datasets.
AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง Researchers developed LEREDD, an LLM-based system that automates the detection of dependencies between software requirements using Retrieval-Augmented Generation and In-Context Learning. The system achieved 93% accuracy in classifying requirement dependencies, significantly outperforming existing baselines with relative gains of over 94% in F1 scores for specific dependency types.
AIBullisharXiv โ CS AI ยท Feb 275/107
๐ง Researchers developed a multimodal AI framework using transformer-based large language models to analyze the critical first three seconds of video advertisements. The system combines visual, auditory, and textual analysis to predict ad performance metrics and optimize video advertising strategies.