956 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv โ CS AI ยท Mar 175/10
๐ง Researchers present OMNIA, a two-stage AI approach that combines structural and semantic reasoning to improve Knowledge Graph Completion using Large Language Models. The method clusters semantically related entities and validates them through embedding filtering and LLM-based validation, showing significant improvements in F1-scores compared to traditional models.
AINeutralarXiv โ CS AI ยท Mar 175/10
๐ง Researchers propose CAP-TTA, a test-time adaptation framework that helps debiased large language models better handle unfamiliar toxic prompts that cause distribution shifts. The method uses context-aware LoRA updates triggered by bias-risk thresholds to reduce toxic outputs while maintaining narrative fluency and reducing computational latency.
AIBullisharXiv โ CS AI ยท Mar 175/10
๐ง Researchers propose an Iterative Semantic Reasoning Framework (ISRF) that uses large language models to improve recommendation systems by bridging explicit individual user interests with implicit group interests. The framework employs multi-step bidirectional reasoning and iterative optimization to achieve better user interest modeling than existing methods.
AINeutralarXiv โ CS AI ยท Mar 175/10
๐ง Researchers developed a hybrid AI architecture combining machine learning and retrieval-augmented generation (RAG) for personalized financial services marketing. The system uses temporal modeling and intent prediction to create compliant, auditable customer communications while improving personalization accuracy.
AINeutralarXiv โ CS AI ยท Mar 175/10
๐ง Researchers introduced SKILLS, a benchmark framework testing whether large language models can execute telecommunications operations through APIs with or without structured domain guidance. The study evaluated 5 open-weight models across 37 telecom scenarios, showing consistent performance improvements when models were augmented with domain-specific guidance documents.
AINeutralarXiv โ CS AI ยท Mar 174/10
๐ง Research from arXiv examines how large language models generate multiple-choice distractors for educational assessments by modeling incorrect student reasoning. The study finds LLMs surprisingly align with educational best practices, first solving problems correctly then simulating misconceptions, with failures primarily occurring in solution recovery and candidate selection rather than error simulation.
AIBullisharXiv โ CS AI ยท Mar 175/10
๐ง Researchers have published a comprehensive review of methods for integrating large language models (LLMs) into virtual reality environments to create more realistic digital humans with personality traits. The study explores various approaches including zero-shot, few-shot, and fine-tuning methods while highlighting challenges like computational demands and latency issues that need to be addressed for practical applications.
AINeutralarXiv โ CS AI ยท Mar 175/10
๐ง Researchers developed a comprehensive benchmarking system to evaluate AI agent performance in single-cell omics analysis, testing 50 real-world tasks across multiple frameworks. The study found that Grok3-beta achieved state-of-the-art performance, while multi-agent frameworks significantly outperformed single-agent approaches through specialized role division.
๐ง Grok
AIBullisharXiv โ CS AI ยท Mar 175/10
๐ง Researchers developed a reproducible pipeline to transform public Zoom recordings into speaker-attributed transcripts for training LLMs to simulate realistic civic deliberations. The method achieved 67% reduction in perplexity and nearly doubled performance metrics, with human evaluations showing simulations often indistinguishable from real government meetings.
๐ข Perplexity
AINeutralarXiv โ CS AI ยท Mar 175/10
๐ง Researchers introduce Jacobian Scopes, a new gradient-based method for interpreting how individual tokens influence Large Language Model predictions. The technique uses perturbation theory and information geometry to reveal model biases, translation strategies, and learning mechanisms, with open-source implementations and an interactive demo available.
๐ข Hugging Face
AINeutralarXiv โ CS AI ยท Mar 174/10
๐ง Researchers developed Agora, an AI-powered platform using LLMs to help users practice consensus-finding skills on policy issues by organizing human voices and providing feedback. A preliminary study with 44 university students showed participants using the full interface reported higher problem-solving skills and produced better consensus statements compared to controls.
AINeutralThe Register โ AI ยท Mar 165/10
๐ง The Free Software Foundation is advocating for open-source, community-developed AI models ("free-range LLMs") as an alternative to proprietary AI systems developed by large corporations ("factory-farmed AI"). This represents a push for democratization and transparency in AI development, emphasizing user freedom and community control over AI technology.
AINeutralarXiv โ CS AI ยท Mar 164/10
๐ง Researchers conducted a mixed-methods study evaluating an LLM-powered BPMN modeling copilot with five domain experts, revealing acceptable usability (67.2/100) but significantly lower trust levels (48.8%). The study highlights critical reliability concerns and demonstrates the need for human-centered evaluation methods beyond automated benchmarking for LLM business tools.
๐ข Microsoft
AINeutralarXiv โ CS AI ยท Mar 164/10
๐ง Researchers developed an automated query expansion framework using multiple large language models that constructs domain-specific examples without manual intervention. The system uses a two-LLM ensemble approach where different models generate expansions that are then refined by a third LLM, showing significant improvements over traditional methods across multiple datasets.
AINeutralarXiv โ CS AI ยท Mar 124/10
๐ง A study evaluates offline large language models for Turkish heritage language education, testing 14 models from 270M to 32B parameters using a Turkish Anomaly Suite. The research finds that 8B-14B parameter reasoning-oriented models offer the best cost-safety balance for educational use, while model size alone doesn't determine anomaly resistance.
AINeutralarXiv โ CS AI ยท Mar 124/10
๐ง Researchers developed an automated framework to evaluate Large Language Models' effectiveness in translating Mandarin Chinese to English, comparing GPT-4, GPT-4o, and DeepSeek against Google Translate. While LLMs performed well on news translation, they showed varying results with literary texts, with DeepSeek excelling at cultural subtleties and GPT-4o/DeepSeek better at semantic conservation.
๐ข Meta๐ง GPT-4
AINeutralarXiv โ CS AI ยท Mar 124/10
๐ง Researchers introduce EvoSchema, a comprehensive benchmark to test how well text-to-SQL AI models handle database schema changes over time. The study reveals that table-level changes significantly impact model performance more than column-level modifications, and proposes training methods to improve model robustness in dynamic database environments.
AIBullisharXiv โ CS AI ยท Mar 115/10
๐ง Researchers developed a chatbot based on Google Gemini 2.0 Flash that automatically generates and solves electromagnetic simulation models, significantly reducing setup time. The system uses Python to coordinate between workflow components and can handle various conductor geometries while providing custom post-processing capabilities.
๐ง Gemini
AIBullisharXiv โ CS AI ยท Mar 115/10
๐ง Researchers developed ELERAG, an enhanced Retrieval-Augmented Generation architecture that integrates Entity Linking with Wikidata to improve factual accuracy in educational AI systems. The system shows significant performance improvements in domain-specific contexts compared to standard RAG approaches, particularly for Italian educational question-answering applications.
AINeutralarXiv โ CS AI ยท Mar 94/10
๐ง Researchers developed PyPDDLEngine, an open-source tool that allows large language models to perform task planning through interactive PDDL simulation. Testing on 102 planning problems showed agentic LLM planning achieved 66.7% success versus 63.7% for direct LLM planning, but at 5.7x higher token cost, while classical planning methods reached 85.3% success.
๐ง Claude๐ง Haiku
AINeutralarXiv โ CS AI ยท Mar 95/10
๐ง A research paper examines challenges in human-data interaction systems as AI transforms data analysis with large-scale, multimodal datasets and foundation models like LLMs and VLMs. The study identifies key issues including scalability constraints, interaction paradigm limitations, and uncertainty in AI-generated insights, calling for redefined human-machine roles in analytical workflows.
AINeutralarXiv โ CS AI ยท Mar 95/10
๐ง Researchers introduced TML-Bench, a new benchmark for evaluating AI coding agents on tabular machine learning tasks similar to Kaggle competitions. The study tested 10 open-source language models across four competitions with different time budgets, finding that MiniMax-M2.1 achieved the best overall performance.
AINeutralarXiv โ CS AI ยท Mar 94/10
๐ง Researchers propose a new reinforcement learning approach for large language models that optimizes for subsets of future rewards rather than full sequences. The method enables comparison of different policy classes and shows varying effectiveness across different conversational AI alignment tasks.
AINeutralarXiv โ CS AI ยท Mar 94/10
๐ง A 4-week study comparing bandit algorithms and LLM architectures for personalized health behavior interventions found that LLM-based messaging approaches were rated more helpful than templates, but contextual bandit optimization provided no additional benefit over LLM-only methods. The research reveals a trade-off between structured exploration of behavior change techniques and generative flexibility in AI health systems.
AINeutralarXiv โ CS AI ยท Mar 95/10
๐ง Researchers investigate how Large Language Models (LLMs) perform in abductive reasoning tasks, which involve drawing tentative conclusions from limited information. The study converts syllogistic datasets to test whether state-of-the-art LLMs exhibit biases in abductive reasoning, aiming to bridge the gap between machine and human cognition.