y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#natural-language-processing News & Analysis

113 articles tagged with #natural-language-processing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

113 articles
AINeutralarXiv – CS AI · 2d ago6/10
🧠

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand

Researchers introduced NRLB, a multi-agent AI framework designed to create plain language summaries accessible to diverse reader groups including elementary students, non-native speakers, and those with attention deficits. The system combines template-based planning with iterative refinement to improve readability while maintaining factual accuracy, achieving human preference rates of 55-76% in evaluations.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations

ESC-Skills introduces a novel framework for emotional support conversation systems that moves beyond end-to-end generation to create interpretable, executable skills. The system discovers support interventions from successful and failed dialogues, organizes them into a skills bank with applicability conditions and risk assessments, then self-improves through multi-profile simulations and systematic failure analysis.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

MetaboT: An LLM-based Multi-Agent Frameworkfor Interactive Analysis of Mass SpectrometryMetabolomics Knowledge Graphs

MetaboT is an open-source LLM-based framework that translates natural-language questions into SPARQL queries for metabolomics knowledge graphs, significantly lowering technical barriers for researchers without programming expertise. The multi-agent architecture addresses hallucination and schema-compliance issues through specialized agents for validation, entity resolution, and query refinement, validated on the Experimental Natural Products Knowledge Graph.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

From Norms to Indicators (N2I-RAG): An Agentic Retrieval-Augmented Generation Framework for Legal Indicator Computation

Researchers introduce N2I-RAG, an AI framework that automates computation of legal indicators from normative texts using retrieval-augmented generation with built-in validation mechanisms. The system addresses hallucination risks in traditional language models by emphasizing traceability and evidence grounding, demonstrating strong performance on French marine environmental law.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs

SEAL introduces a two-stage semantic parsing framework that combines large language models with agentic learning to improve conversational question answering over knowledge graphs. The system self-evolves through dialog history and execution feedback without retraining, achieving state-of-the-art results on complex multi-hop reasoning and aggregation tasks while reducing computational costs.

AINeutralarXiv – CS AI · May 125/10
🧠

Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke

Researchers evaluate semantic search as a tool for analyzing 18th-century intellectual history, specifically tracking how John Locke's ideas circulated through paraphrases and implicit references. While semantic search substantially outperforms traditional lexical methods at capturing meaning-level correspondences, linguistic analysis reveals that retrieval remains constrained by surface-level vocabulary overlap, suggesting both promise and limitations for historical corpus analysis.

AIBullisharXiv – CS AI · May 126/10
🧠

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

Researchers propose Semantic Softmax, a novel inference-time method that improves zero-shot LLM classification by recovering probability mass lost during constrained decoding. The approach aggregates scores from semantic synonyms, reducing calibration errors and boosting accuracy on emotion and toxicity detection tasks.

AINeutralarXiv – CS AI · May 126/10
🧠

TrajPrism: A Multi-Task Benchmark for Language-Grounded Urban Trajectory Understanding

Researchers introduced TrajPrism, a comprehensive benchmark dataset combining 300K real urban trajectories with natural language annotations across three cities, enabling AI models to understand the alignment between physical travel paths and human descriptions of movement intent, constraints, and preferences.

AINeutralarXiv – CS AI · May 126/10
🧠

Do not copy and paste! Rewriting strategies for code retrieval

Researchers evaluated multiple code retrieval strategies using LLM-based rewriting, finding that full natural language transcription with query-corpus augmentation achieves the largest gains but corpus-only approaches often degrade performance. They introduced Delta H (token entropy) as a cheap, rewriter-agnostic metric to predict when LLM rewriting justifies its computational cost.

AINeutralarXiv – CS AI · May 126/10
🧠

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

Researchers introduced Magis-Bench, a new benchmark for evaluating large language models on magistrate-level judicial tasks based on Brazilian competitive exams. Testing 23 state-of-the-art LLMs revealed that even top performers like Google's Gemini-3-Pro-Preview score below 70% on complex legal reasoning and judicial writing tasks, indicating significant gaps in AI legal capabilities.

🧠 Claude🧠 Gemini
AINeutralarXiv – CS AI · May 126/10
🧠

Communicating Sound Through Natural Language

Researchers introduce Lexical Acoustic Coding (LAC), a framework enabling LLM agents to transmit audio through natural language by converting sound into interpretable acoustic descriptors and verbalizing them as English text. The approach frames audio transmission as a quantization problem, balancing vocabulary size, transmission rate, and fidelity while keeping the transmitted text editable and human-readable.

AINeutralarXiv – CS AI · May 126/10
🧠

From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages

Researchers conducted a systematic evaluation of large language models for part-of-speech tagging in Medieval Romance languages, comparing them against traditional taggers. The study demonstrates that LLM-based approaches with fine-tuning and cross-lingual transfer learning significantly outperform conventional methods, offering practical applications for digital humanities research on historical texts.

AIBullisharXiv – CS AI · May 116/10
🧠

RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation

Researchers propose RRCM, a novel framework that enhances Large Language Model-based recommendation systems by dynamically retrieving relevant collaborative and metadata information. The system learns optimal context construction through ranking-driven optimization, addressing key challenges in balancing context quality with efficiency limitations.

AINeutralarXiv – CS AI · May 116/10
🧠

PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat

Researchers developed a toxicity detection system for gaming chat using fine-tuned Llama 3.1 with synthetic data augmentation, achieving 4th place in the EEUCA 2026 shared task. The system classifies messages into six toxicity categories and reveals a critical "validation trap" phenomenon where high validation performance doesn't correlate with strong test set generalization.

🧠 Llama
AIBullisharXiv – CS AI · May 116/10
🧠

End-to-end PDDL Planning with Hardcoded and Dynamic Agents

Researchers present an end-to-end framework that uses Large Language Models to convert natural language specifications into PDDL planning models, with iterative refinement through hardcoded and dynamic agents, then generates executable plans. The system demonstrates strong performance across multiple domains including classic planning problems where LLMs typically struggle, and integrates with established planning engines.

🧠 Gemini
AINeutralarXiv – CS AI · May 76/10
🧠

StoryAlign: Evaluating and Training Reward Models for Story Generation

Researchers introduce StoryRMB, the first benchmark for evaluating reward models on story generation preferences, and develop StoryReward, a specialized reward model achieving 66.3% accuracy where existing models struggle. The work addresses the challenge of modeling subjective human preferences in narrative generation, enabling better alignment between LLM-generated stories and human expectations.

AINeutralarXiv – CS AI · May 16/10
🧠

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

Researchers have developed an agentic framework that uses knowledge graphs to help large language models understand and reason about AI policy documents. The system was tested on multiple AI safety regulations, demonstrating that knowledge graph augmentation improves LLM performance across various reasoning tasks from simple entity lookup to complex cross-policy inference.

AINeutralarXiv – CS AI · May 16/10
🧠

Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future

A comprehensive survey examines how large language models can assist or automate peer review processes across academia, synthesizing techniques for review generation, post-review tasks, and evaluation methods. The research catalogs datasets and modeling approaches while addressing ethical concerns and practical implementation challenges for integrating AI into scholarly publishing workflows.

AINeutralarXiv – CS AI · Apr 156/10
🧠

LLM as Attention-Informed NTM and Topic Modeling as long-input Generation: Interpretability and long-Context Capability

Researchers propose a novel framework treating Large Language Models as attention-informed Neural Topic Models, enabling interpretable topic extraction from documents. The approach combines white-box interpretability analysis with black-box long-context LLM capabilities, demonstrating competitive performance on topic modeling tasks while maintaining semantic clarity.

AINeutralarXiv – CS AI · Apr 146/10
🧠

VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline

VeriTrans is a machine learning system that converts natural language requirements into formal logic suitable for automated solvers, using a validator-gated pipeline to ensure reliability. Achieving 94.46% correctness on 2,100 specifications, the system combines fine-tuned language models with round-trip verification and deterministic execution, enabling auditable translation for critical applications.

$PL$NL$CNF
AINeutralarXiv – CS AI · Apr 136/10
🧠

Model Space Reasoning as Search in Feedback Space for Planning Domain Generation

Researchers present a novel approach using agentic language model feedback frameworks to generate planning domains from natural language descriptions augmented with symbolic information. The method employs heuristic search over model space optimized by various feedback mechanisms, including landmarks and plan validator outputs, to improve domain quality for practical deployment.

AIBullisharXiv – CS AI · Apr 106/10
🧠

Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

Researchers demonstrate that Large Language Models used as judges suffer from score range bias, where evaluation outputs are highly sensitive to predefined scoring scales. Using contrastive decoding techniques, they achieve up to 11.7% improvement in alignment with human judgments across different score ranges.

AIBullisharXiv – CS AI · Apr 76/10
🧠

Scaling DPPs for RAG: Density Meets Diversity

Researchers propose ScalDPP, a new retrieval mechanism for RAG systems that uses Determinantal Point Processes to optimize both density and diversity in context selection. The approach addresses limitations in current RAG pipelines that ignore interactions between retrieved information chunks, leading to redundant contexts that reduce effectiveness.

AIBullisharXiv – CS AI · Apr 66/10
🧠

A Paradigm Shift: Fully End-to-End Training for Temporal Sentence Grounding in Videos

Researchers propose a fully end-to-end training paradigm for temporal sentence grounding in videos, introducing the Sentence Conditioned Adapter (SCADA) to better align video understanding with natural language queries. The method outperforms existing approaches by jointly optimizing video backbones and localization components rather than using frozen pre-trained encoders.

AINeutralarXiv – CS AI · Mar 126/10
🧠

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

Researchers propose new uncertainty elicitation techniques for large language models using imprecise probabilities framework to better capture higher-order uncertainty. The approach addresses systematic failures in ambiguous question-answering and self-reflection by quantifying both first-order uncertainty over responses and second-order uncertainty about the probability model itself.

← PrevPage 2 of 5Next →