AIBullisharXiv – CS AI · Mar 66/10
🧠Researchers introduce the What Is Missing (WIM) rating system for Large Language Models that uses natural-language feedback instead of numerical ratings to improve preference learning. WIM computes ratings by analyzing cosine similarity between model outputs and judge feedback embeddings, producing more interpretable and effective training signals with fewer ties than traditional rating methods.
AINeutralarXiv – CS AI · Mar 45/103
🧠Researchers propose a new framework for handling ambiguity in natural language queries for tabular data analysis, reframing ambiguity as a cooperative feature rather than a deficiency. The study analyzes 15 datasets and finds that current evaluation methods inadequately assess both system accuracy and interpretation capabilities.
AIBullisharXiv – CS AI · Mar 36/107
🧠Researchers have developed Thoth, the first family of Large Language Models specifically designed to understand and reason about time series data through a mid-training approach. The model uses a specialized corpus called Book-of-Thoth to bridge the gap between temporal data and natural language, significantly outperforming existing LLMs in time series analysis tasks.
AIBullisharXiv – CS AI · Mar 36/106
🧠Researchers introduce TripleSumm, a novel AI architecture that adaptively fuses visual, text, and audio modalities for improved video summarization. The team also releases MoSu, the first large-scale benchmark dataset providing all three modalities for multimodal video summarization research.
AIBullisharXiv – CS AI · Mar 36/107
🧠Researchers have developed RGLM, a new approach to improve how large language models understand and process graph data by incorporating explicit graph supervision alongside text instructions. The method addresses limitations in existing Graph-Tokenizing LLMs that rely too heavily on text supervision, leading to underutilization of graph context.
AIBullisharXiv – CS AI · Mar 36/104
🧠Researchers have developed the Cognitive Prosthetic Multimodal System (CPMS), an AI-enabled proof-of-concept that helps knowledge workers recall workplace experiences by capturing speech, physiological signals, and gaze behavior into queryable episodic memories. The system processes data locally for privacy and allows natural language queries to retrieve past workplace interactions based on semantic content, time, attention, or physiological state.
AINeutralarXiv – CS AI · Mar 36/104
🧠Researchers introduced EHR-ChatQA, a new benchmark for testing AI agents that interact with Electronic Health Record databases through natural language queries. The benchmark reveals significant reliability gaps in current state-of-the-art LLMs, with success rates dropping substantially when consistency across multiple trials is required.
AIBullisharXiv – CS AI · Mar 36/103
🧠Researchers introduce WavefrontDiffusion, a new dynamic decoding approach for Diffusion Language Models that improves text generation quality by expanding from finalized positions rather than using fixed blocks. The method achieves state-of-the-art performance on reasoning and code generation benchmarks while maintaining computational efficiency equivalent to existing block-based methods.
AIBearisharXiv – CS AI · Mar 36/104
🧠A comprehensive study of 17 Large Language Models as automated annotators for Bangla hate speech detection reveals significant bias and instability issues. The research found that larger models don't necessarily perform better than smaller, task-specific ones, raising concerns about LLM reliability for sensitive annotation tasks in low-resource languages.
AIBullisharXiv – CS AI · Mar 26/1014
🧠Researchers have developed SleepLM, a family of AI foundation models that combine natural language processing with sleep analysis using polysomnography data. The system can interpret and describe sleep patterns in natural language, trained on over 100K hours of sleep data from 10,000+ individuals, enabling new capabilities like language-guided sleep event detection and zero-shot generalization to novel sleep analysis tasks.
AIBullisharXiv – CS AI · Mar 27/1012
🧠Researchers have introduced Hello-Chat, an end-to-end audio language model designed to create more realistic and emotionally resonant AI conversations. The model addresses the robotic nature of existing Large Audio Language Models by using real-life conversation data and achieving breakthrough performance in prosodic naturalness and emotional alignment.
AIBullisharXiv – CS AI · Mar 26/1010
🧠Researchers developed the TREC 2025 DRAGUN Track to evaluate AI systems that help readers assess news trustworthiness through automated report generation. The initiative created reusable evaluation resources including human-assessed rubrics and an AutoJudge system that correlates well with human evaluations for RAG-based news analysis tools.
AIBullisharXiv – CS AI · Feb 276/105
🧠Researchers introduce MoDora, an AI-powered system that uses tree-based analysis to understand and answer questions about semi-structured documents containing mixed data elements like tables, charts, and text. The system addresses challenges in processing fragmented OCR data and hierarchical document structures, achieving 5.97%-61.07% accuracy improvements over existing baselines.
AINeutralarXiv – CS AI · Feb 276/107
🧠Researchers introduce SPARTA, an automated framework for generating large-scale Table-Text question answering benchmarks that require complex multi-hop reasoning across structured and unstructured data. The benchmark exposes significant weaknesses in current AI models, with state-of-the-art systems experiencing over 30 F1 point performance drops compared to existing simpler datasets.
AINeutralarXiv – CS AI · Feb 276/106
🧠Researchers propose KGT, a novel framework that bridges the gap between Large Language Models and Knowledge Graph Completion by using dedicated entity tokens for full-space prediction. The approach addresses fundamental granularity mismatches through specialized tokenization, feature fusion, and decoupled prediction mechanisms.
AIBullisharXiv – CS AI · Feb 276/104
🧠Researchers introduce SOTAlign, a new framework for aligning vision and language AI models using minimal supervised data. The method uses optimal transport theory to achieve better alignment with significantly less paired training data than traditional approaches.
AIBullisharXiv – CS AI · Feb 276/106
🧠Researchers developed MALLET, a multi-agent AI system that reduces emotional intensity in news content by up to 19.3% while preserving semantic meaning. The system uses four specialized agents to analyze, adjust, and personalize content presentation modes for calmer decision-making without restricting access to original information.
$NEAR
AIBullisharXiv – CS AI · Feb 275/107
🧠Researchers developed a multimodal AI framework using transformer-based large language models to analyze the critical first three seconds of video advertisements. The system combines visual, auditory, and textual analysis to predict ad performance metrics and optimize video advertising strategies.
AIBullisharXiv – CS AI · Feb 276/106
🧠Researchers developed LEREDD, an LLM-based system that automates the detection of dependencies between software requirements using Retrieval-Augmented Generation and In-Context Learning. The system achieved 93% accuracy in classifying requirement dependencies, significantly outperforming existing baselines with relative gains of over 94% in F1 scores for specific dependency types.
AIBullisharXiv – CS AI · Feb 276/106
🧠Researchers developed a three-stage framework using Small Language Models (SLMs) to automatically translate natural language queries into Kusto Query Language (KQL) for cybersecurity operations. The approach achieves high accuracy (98.7% syntax, 90.6% semantic) while reducing costs by up to 10x compared to GPT-4, potentially solving bottlenecks in Security Operations Centers.
AIBullisharXiv – CS AI · Feb 275/106
🧠Researchers developed a learned scheduler for masked diffusion models (MDMs) in language modeling that outperforms traditional rule-based approaches. The new method uses a KL-regularized Markov decision process framework and demonstrated significant improvements, including 20.1% gains over random scheduling and 11.2% over max-confidence approaches on benchmark tests.
AIBullisharXiv – CS AI · Feb 276/107
🧠Researchers introduce RELOOP, a new retrieval-augmented generation framework that improves multi-step question answering across text, tables, and knowledge graphs. The system uses hierarchical sequences and structure-aware iteration to achieve better accuracy while reducing computational costs compared to existing RAG methods.
AINeutralApple Machine Learning · Feb 256/103
🧠Research identifies a significant performance gap between speech-adapted Large Language Models and their text-based counterparts on language understanding tasks. Current approaches to bridge this gap rely on expensive large-scale speech synthesis methods, highlighting a key challenge in extending LLM capabilities to audio inputs.
AIBullishHugging Face Blog · Feb 276/105
🧠HuggingFace has partnered with the Indian Institute of Science (IISc) to enhance AI model development for India's diverse linguistic landscape. This collaboration aims to improve natural language processing capabilities across multiple Indian languages, potentially expanding AI accessibility in the region.
AIBullishOpenAI News · Mar 156/106
🧠OpenAI has released new versions of GPT-3 and Codex with enhanced capabilities that allow users to edit and insert content into existing text, rather than only completing text. This represents a significant advancement in AI text editing functionality beyond traditional text generation.