AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduced NRLB, a multi-agent AI framework designed to create plain language summaries accessible to diverse reader groups including elementary students, non-native speakers, and those with attention deficits. The system combines template-based planning with iterative refinement to improve readability while maintaining factual accuracy, achieving human preference rates of 55-76% in evaluations.
AINeutralarXiv – CS AI · 3d ago6/10
🧠ESC-Skills introduces a novel framework for emotional support conversation systems that moves beyond end-to-end generation to create interpretable, executable skills. The system discovers support interventions from successful and failed dialogues, organizes them into a skills bank with applicability conditions and risk assessments, then self-improves through multi-profile simulations and systematic failure analysis.
AINeutralarXiv – CS AI · 3d ago6/10
🧠MetaboT is an open-source LLM-based framework that translates natural-language questions into SPARQL queries for metabolomics knowledge graphs, significantly lowering technical barriers for researchers without programming expertise. The multi-agent architecture addresses hallucination and schema-compliance issues through specialized agents for validation, entity resolution, and query refinement, validated on the Experimental Natural Products Knowledge Graph.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce N2I-RAG, an AI framework that automates computation of legal indicators from normative texts using retrieval-augmented generation with built-in validation mechanisms. The system addresses hallucination risks in traditional language models by emphasizing traceability and evidence grounding, demonstrating strong performance on French marine environmental law.
AINeutralarXiv – CS AI · 4d ago6/10
🧠SEAL introduces a two-stage semantic parsing framework that combines large language models with agentic learning to improve conversational question answering over knowledge graphs. The system self-evolves through dialog history and execution feedback without retraining, achieving state-of-the-art results on complex multi-hop reasoning and aggregation tasks while reducing computational costs.
AINeutralarXiv – CS AI · May 125/10
🧠Researchers evaluate semantic search as a tool for analyzing 18th-century intellectual history, specifically tracking how John Locke's ideas circulated through paraphrases and implicit references. While semantic search substantially outperforms traditional lexical methods at capturing meaning-level correspondences, linguistic analysis reveals that retrieval remains constrained by surface-level vocabulary overlap, suggesting both promise and limitations for historical corpus analysis.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers propose Semantic Softmax, a novel inference-time method that improves zero-shot LLM classification by recovering probability mass lost during constrained decoding. The approach aggregates scores from semantic synonyms, reducing calibration errors and boosting accuracy on emotion and toxicity detection tasks.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduced TrajPrism, a comprehensive benchmark dataset combining 300K real urban trajectories with natural language annotations across three cities, enabling AI models to understand the alignment between physical travel paths and human descriptions of movement intent, constraints, and preferences.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers evaluated multiple code retrieval strategies using LLM-based rewriting, finding that full natural language transcription with query-corpus augmentation achieves the largest gains but corpus-only approaches often degrade performance. They introduced Delta H (token entropy) as a cheap, rewriter-agnostic metric to predict when LLM rewriting justifies its computational cost.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduced Magis-Bench, a new benchmark for evaluating large language models on magistrate-level judicial tasks based on Brazilian competitive exams. Testing 23 state-of-the-art LLMs revealed that even top performers like Google's Gemini-3-Pro-Preview score below 70% on complex legal reasoning and judicial writing tasks, indicating significant gaps in AI legal capabilities.
🧠 Claude🧠 Gemini
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce Lexical Acoustic Coding (LAC), a framework enabling LLM agents to transmit audio through natural language by converting sound into interpretable acoustic descriptors and verbalizing them as English text. The approach frames audio transmission as a quantization problem, balancing vocabulary size, transmission rate, and fidelity while keeping the transmitted text editable and human-readable.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers conducted a systematic evaluation of large language models for part-of-speech tagging in Medieval Romance languages, comparing them against traditional taggers. The study demonstrates that LLM-based approaches with fine-tuning and cross-lingual transfer learning significantly outperform conventional methods, offering practical applications for digital humanities research on historical texts.
AIBullisharXiv – CS AI · May 116/10
🧠Researchers propose RRCM, a novel framework that enhances Large Language Model-based recommendation systems by dynamically retrieving relevant collaborative and metadata information. The system learns optimal context construction through ranking-driven optimization, addressing key challenges in balancing context quality with efficiency limitations.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers developed a toxicity detection system for gaming chat using fine-tuned Llama 3.1 with synthetic data augmentation, achieving 4th place in the EEUCA 2026 shared task. The system classifies messages into six toxicity categories and reveals a critical "validation trap" phenomenon where high validation performance doesn't correlate with strong test set generalization.
🧠 Llama
AIBullisharXiv – CS AI · May 116/10
🧠Researchers present an end-to-end framework that uses Large Language Models to convert natural language specifications into PDDL planning models, with iterative refinement through hardcoded and dynamic agents, then generates executable plans. The system demonstrates strong performance across multiple domains including classic planning problems where LLMs typically struggle, and integrates with established planning engines.
🧠 Gemini
AINeutralarXiv – CS AI · May 76/10
🧠Researchers introduce StoryRMB, the first benchmark for evaluating reward models on story generation preferences, and develop StoryReward, a specialized reward model achieving 66.3% accuracy where existing models struggle. The work addresses the challenge of modeling subjective human preferences in narrative generation, enabling better alignment between LLM-generated stories and human expectations.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers have developed an agentic framework that uses knowledge graphs to help large language models understand and reason about AI policy documents. The system was tested on multiple AI safety regulations, demonstrating that knowledge graph augmentation improves LLM performance across various reasoning tasks from simple entity lookup to complex cross-policy inference.
AINeutralarXiv – CS AI · May 16/10
🧠A comprehensive survey examines how large language models can assist or automate peer review processes across academia, synthesizing techniques for review generation, post-review tasks, and evaluation methods. The research catalogs datasets and modeling approaches while addressing ethical concerns and practical implementation challenges for integrating AI into scholarly publishing workflows.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers propose a novel framework treating Large Language Models as attention-informed Neural Topic Models, enabling interpretable topic extraction from documents. The approach combines white-box interpretability analysis with black-box long-context LLM capabilities, demonstrating competitive performance on topic modeling tasks while maintaining semantic clarity.
AINeutralarXiv – CS AI · Apr 146/10
🧠VeriTrans is a machine learning system that converts natural language requirements into formal logic suitable for automated solvers, using a validator-gated pipeline to ensure reliability. Achieving 94.46% correctness on 2,100 specifications, the system combines fine-tuned language models with round-trip verification and deterministic execution, enabling auditable translation for critical applications.
$PL$NL$CNF
AINeutralarXiv – CS AI · Apr 136/10
🧠Researchers present a novel approach using agentic language model feedback frameworks to generate planning domains from natural language descriptions augmented with symbolic information. The method employs heuristic search over model space optimized by various feedback mechanisms, including landmarks and plan validator outputs, to improve domain quality for practical deployment.
AIBullisharXiv – CS AI · Apr 106/10
🧠Researchers demonstrate that Large Language Models used as judges suffer from score range bias, where evaluation outputs are highly sensitive to predefined scoring scales. Using contrastive decoding techniques, they achieve up to 11.7% improvement in alignment with human judgments across different score ranges.
AIBullisharXiv – CS AI · Apr 76/10
🧠Researchers propose ScalDPP, a new retrieval mechanism for RAG systems that uses Determinantal Point Processes to optimize both density and diversity in context selection. The approach addresses limitations in current RAG pipelines that ignore interactions between retrieved information chunks, leading to redundant contexts that reduce effectiveness.
AIBullisharXiv – CS AI · Apr 66/10
🧠Researchers propose a fully end-to-end training paradigm for temporal sentence grounding in videos, introducing the Sentence Conditioned Adapter (SCADA) to better align video understanding with natural language queries. The method outperforms existing approaches by jointly optimizing video backbones and localization components rather than using frozen pre-trained encoders.
AINeutralarXiv – CS AI · Mar 126/10
🧠Researchers propose new uncertainty elicitation techniques for large language models using imprecise probabilities framework to better capture higher-order uncertainty. The approach addresses systematic failures in ambiguous question-answering and self-reflection by quantifying both first-order uncertainty over responses and second-order uncertainty about the probability model itself.