#llm News & Analysis

956 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

956 articles

AINeutralarXiv – CS AI · Mar 175/10

🧠

OMNIA: Closing the Loop by Leveraging LLMs for Knowledge Graph Completion

Researchers present OMNIA, a two-stage AI approach that combines structural and semantic reasoning to improve Knowledge Graph Completion using Large Language Models. The method clusters semantically related entities and validates them through embedding filtering and LLM-based validation, showing significant improvements in F1-scores compared to traditional models.

AINeutralarXiv – CS AI · Mar 175/10

🧠

Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation

Researchers propose CAP-TTA, a test-time adaptation framework that helps debiased large language models better handle unfamiliar toxic prompts that cause distribution shifts. The method uses context-aware LoRA updates triggered by bias-risk thresholds to reduce toxic outputs while maintaining narrative fluency and reducing computational latency.

AIBullisharXiv – CS AI · Mar 175/10

🧠

Iterative Semantic Reasoning from Individual to Group Interests for Generative Recommendation with LLMs

Researchers propose an Iterative Semantic Reasoning Framework (ISRF) that uses large language models to improve recommendation systems by bridging explicit individual user interests with implicit group interests. The framework employs multi-step bidirectional reasoning and iterative optimization to achieve better user interest modeling than existing methods.

AINeutralarXiv – CS AI · Mar 175/10

🧠

Hybrid Intent-Aware Personalization with Machine Learning and RAG-Enabled Large Language Models for Financial Services Marketing

Researchers developed a hybrid AI architecture combining machine learning and retrieval-augmented generation (RAG) for personalized financial services marketing. The system uses temporal modeling and intent prediction to create compliant, auditable customer communications while improving personalization accuracy.

AINeutralarXiv – CS AI · Mar 175/10

🧠

SKILLS: Structured Knowledge Injection for LLM-Driven Telecommunications Operations

Researchers introduced SKILLS, a benchmark framework testing whether large language models can execute telecommunications operations through APIs with or without structured domain guidance. The study evaluated 5 open-weight models across 37 telecom scenarios, showing consistent performance improvements when models were augmented with domain-specific guidance documents.

AINeutralarXiv – CS AI · Mar 174/10

🧠

Can LLMs Model Incorrect Student Reasoning? A Case Study on Distractor Generation

Research from arXiv examines how large language models generate multiple-choice distractors for educational assessments by modeling incorrect student reasoning. The study finds LLMs surprisingly align with educational best practices, first solving problems correctly then simulating misconceptions, with failures primarily occurring in solution recovery and candidate selection rather than error simulation.

AIBullisharXiv – CS AI · Mar 175/10

🧠

Integrating Personality into Digital Humans: A Review of LLM-Driven Approaches for Virtual Reality

Researchers have published a comprehensive review of methods for integrating large language models (LLMs) into virtual reality environments to create more realistic digital humans with personality traits. The study explores various approaches including zero-shot, few-shot, and fine-tuning methods while highlighting challenges like computational demands and latency issues that need to be addressed for practical applications.

AINeutralarXiv – CS AI · Mar 175/10

🧠

Benchmarking LLM-based agents for single-cell omics analysis

Researchers developed a comprehensive benchmarking system to evaluate AI agent performance in single-cell omics analysis, testing 50 real-world tasks across multiple frameworks. The study found that Grok3-beta achieved state-of-the-art performance, while multi-agent frameworks significantly outperformed single-agent approaches through specialized role division.

🧠 Grok

AIBullisharXiv – CS AI · Mar 175/10

🧠

Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation

Researchers developed a reproducible pipeline to transform public Zoom recordings into speaker-attributed transcripts for training LLMs to simulate realistic civic deliberations. The method achieved 67% reduction in perplexity and nearly doubled performance metrics, with human evaluations showing simulations often indistinguishable from real government meetings.

🏢 Perplexity

AINeutralarXiv – CS AI · Mar 175/10

🧠

Jacobian Scopes: token-level causal attributions in LLMs

Researchers introduce Jacobian Scopes, a new gradient-based method for interpreting how individual tokens influence Large Language Model predictions. The technique uses perturbation theory and information geometry to reveal model biases, translation strategies, and learning mechanisms, with open-source implementations and an interactive demo available.

🏢 Hugging Face

AINeutralarXiv – CS AI · Mar 174/10

🧠

Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice

Researchers developed Agora, an AI-powered platform using LLMs to help users practice consensus-finding skills on policy issues by organizing human voices and providing feedback. A preliminary study with 44 university students showed participants using the full interface reported higher problem-solving skills and produced better consensus statements compared to controls.

AINeutralThe Register – AI · Mar 165/10

🧠

Free Software Foundation calls for free-range LLMs rather than factory-farmed AI

The Free Software Foundation is advocating for open-source, community-developed AI models ("free-range LLMs") as an alternative to proprietary AI systems developed by large corporations ("factory-farmed AI"). This represents a push for democratization and transparency in AI development, emphasizing user freedom and community control over AI technology.

AINeutralarXiv – CS AI · Mar 164/10

🧠

Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts

Researchers conducted a mixed-methods study evaluating an LLM-powered BPMN modeling copilot with five domain experts, revealing acceptable usability (67.2/100) but significantly lower trust levels (48.8%). The study highlights critical reliability concerns and demonstrates the need for human-centered evaluation methods beyond automated benchmarking for LLM business tools.

🏢 Microsoft

AINeutralarXiv – CS AI · Mar 164/10

🧠

Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM Expansions for Query Expansion

Researchers developed an automated query expansion framework using multiple large language models that constructs domain-specific examples without manual intervention. The system uses a two-LLM ensemble approach where different models generate expansions that are then refined by a third LLM, showing significant improvements over traditional methods across multiple datasets.

AINeutralarXiv – CS AI · Mar 124/10

🧠

There Are No Silly Questions: Evaluation of Offline LLM Capabilities from a Turkish Perspective

A study evaluates offline large language models for Turkish heritage language education, testing 14 models from 270M to 32B parameters using a Turkish Anomaly Suite. The research finds that 8B-14B parameter reasoning-oriented models offer the best cost-safety balance for educational use, while model size alone doesn't determine anomaly resistance.

AINeutralarXiv – CS AI · Mar 124/10

🧠

Automated evaluation of LLMs for effective machine translation of Mandarin Chinese to English

Researchers developed an automated framework to evaluate Large Language Models' effectiveness in translating Mandarin Chinese to English, comparing GPT-4, GPT-4o, and DeepSeek against Google Translate. While LLMs performed well on news translation, they showed varying results with literary texts, with DeepSeek excelling at cultural subtleties and GPT-4o/DeepSeek better at semantic conservation.

🏢 Meta🧠 GPT-4

AINeutralarXiv – CS AI · Mar 124/10

🧠

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

Researchers introduce EvoSchema, a comprehensive benchmark to test how well text-to-SQL AI models handle database schema changes over time. The study reveals that table-level changes significantly impact model performance more than column-level modifications, and proposes training methods to improve model robustness in dynamic database environments.

AIBullisharXiv – CS AI · Mar 115/10

🧠

Research and Prototyping Study of an LLM-Based Chatbot for Electromagnetic Simulations

Researchers developed a chatbot based on Google Gemini 2.0 Flash that automatically generates and solves electromagnetic simulation models, significantly reducing setup time. The system uses Python to coordinate between workflow components and can handle various conductor geometries while providing custom post-processing capabilities.

🧠 Gemini

AIBullisharXiv – CS AI · Mar 115/10

🧠

Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

Researchers developed ELERAG, an enhanced Retrieval-Augmented Generation architecture that integrates Entity Linking with Wikidata to improve factual accuracy in educational AI systems. The system shows significant performance improvements in domain-specific contexts compared to standard RAG approaches, particularly for Italian educational question-answering applications.

AINeutralarXiv – CS AI · Mar 94/10

🧠

Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation

Researchers developed PyPDDLEngine, an open-source tool that allows large language models to perform task planning through interactive PDDL simulation. Testing on 102 planning problems showed agentic LLM planning achieved 66.7% success versus 63.7% for direct LLM planning, but at 5.7x higher token cost, while classical planning methods reached 85.3% success.

🧠 Claude🧠 Haiku

AINeutralarXiv – CS AI · Mar 95/10

🧠

Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities

A research paper examines challenges in human-data interaction systems as AI transforms data analysis with large-scale, multimodal datasets and foundation models like LLMs and VLMs. The study identifies key issues including scalability constraints, interaction paradigm limitations, and uncertainty in AI-generated insights, calling for redefined human-machine roles in analytical workflows.

AINeutralarXiv – CS AI · Mar 95/10

🧠

TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks

Researchers introduced TML-Bench, a new benchmark for evaluating AI coding agents on tabular machine learning tasks similar to Kaggle competitions. The study tested 10 open-source language models across four competitions with different time budgets, finding that MiniMax-M2.1 achieved the best overall performance.

AINeutralarXiv – CS AI · Mar 94/10

🧠

Partial Policy Gradients for RL in LLMs

Researchers propose a new reinforcement learning approach for large language models that optimizes for subsets of future rewards rather than full sequences. The method enables comparison of different policy classes and shows varying effectiveness across different conversational AI alignment tasks.

AINeutralarXiv – CS AI · Mar 94/10

🧠

Structured Exploration vs. Generative Flexibility: A Field Study Comparing Bandit and LLM Architectures for Personalised Health Behaviour Interventions

A 4-week study comparing bandit algorithms and LLM architectures for personalized health behavior interventions found that LLM-based messaging approaches were rated more helpful than templates, but contextual bandit optimization provided no additional benefit over LLM-only methods. The research reveals a trade-off between structured exploration of behavior change techniques and generative flexibility in AI health systems.

AINeutralarXiv – CS AI · Mar 95/10

🧠

Abductive Reasoning with Syllogistic Forms in Large Language Models

Researchers investigate how Large Language Models (LLMs) perform in abductive reasoning tasks, which involve drawing tentative conclusions from limited information. The study converts syllogistic datasets to test whether state-of-the-art LLMs exhibit biases in abductive reasoning, aiming to bridge the gap between machine and human cognition.

← PrevPage 34 of 39Next →