956 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 126/10
๐ง Researchers introduce EvoKernel, a self-evolving AI framework that addresses the 'Data Wall' problem in deploying Large Language Models for kernel synthesis on data-scarce hardware platforms like NPUs. The system uses memory-based reinforcement learning to improve correctness from 11% to 83% and achieves 3.60x speedup through iterative refinement.
AIBullisharXiv โ CS AI ยท Mar 126/10
๐ง Researchers have developed LookaheadKV, a new framework that significantly improves memory efficiency in large language models by intelligently evicting less important cached data. The method achieves superior accuracy while reducing computational costs by up to 14.5x compared to existing approaches, making long-context AI tasks more practical.
AIBullisharXiv โ CS AI ยท Mar 126/10
๐ง Research demonstrates that LoRA fine-tuning of large language models significantly improves text-to-speech systems, achieving up to 0.42 DNS-MOS gains and 34% SNR improvements when training data has sufficient acoustic diversity. The study establishes LoRA as an effective mechanism for speaker adaptation in compact LLM-based TTS systems, outperforming frozen base models across perceptual quality, speaker fidelity, and signal quality metrics.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers present LLM Delegate Protocol (LDP), a new AI-native communication protocol for multi-agent LLM systems that introduces identity awareness, progressive payloads, and governance mechanisms. The protocol achieves 12x lower latency on simple tasks and 37% token reduction compared to existing protocols like A2A, though quality improvements remain limited in small delegate pools.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers developed BD-FDG, a framework for adapting large language models to complex engineering domains like space situational awareness. The method creates high-quality training datasets using structured knowledge organization and cognitive layering, resulting in SSA-LLM-8B that shows 144-176% BLEU-1 improvements while maintaining general performance.
AINeutralarXiv โ CS AI ยท Mar 116/10
๐ง Research reveals that LLMs heavily concentrate their confidence scores on just three round numbers when using standard 0-100 scales, with over 78% of responses showing this pattern. The study demonstrates that using a 0-20 confidence scale significantly improves metacognitive efficiency compared to the conventional 0-100 format.
AINeutralarXiv โ CS AI ยท Mar 116/10
๐ง Researchers developed a method using Large Language Models to create personalized fake news debunking messages tailored to individuals' Big Five personality traits. The study found that personalized debunking messages are more persuasive than generic ones, with traits like Openness increasing persuadability while Neuroticism decreases it.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers introduce PRECEPT, a new framework for AI language model agents that improves knowledge retrieval and adaptation through structured rule learning and conflict-aware memory systems. The framework shows significant performance improvements over existing methods, with 41% better first-try accuracy and enhanced compositional reasoning capabilities.
AINeutralarXiv โ CS AI ยท Mar 116/10
๐ง Researchers propose a framework using policy-parameterized prompts to influence multi-agent LLM dialogue behavior without training. The approach treats prompts as actions and dynamically constructs them through five components to control conversation flow based on metrics like responsiveness and stance shift.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers have introduced Turn, a new compiled programming language specifically designed for building autonomous AI agents that use large language models. The language includes built-in features like cognitive type safety, confidence operators, and actor-based process models to address common challenges in agentic software development.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers introduce Test-Driven AI Agent Definition (TDAD), a methodology that compiles AI agent prompts from behavioral specifications using automated testing. The approach addresses production deployment challenges by ensuring measurable behavioral compliance and preventing silent regressions in tool-using LLM agents.
AINeutralarXiv โ CS AI ยท Mar 116/10
๐ง Researchers developed Arbiter, a framework to detect interference patterns in system prompts for LLM-based coding agents. Testing on major platforms (Claude, Codex, Gemini) revealed 152 findings and 21 interference patterns, with one discovery leading to a Google patch for Gemini CLI's memory system.
๐ข OpenAI๐ข Anthropic๐ง Claude
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง DuplexCascade introduces a VAD-free cascaded streaming pipeline that enables full-duplex speech-to-speech dialogue while maintaining LLM intelligence. The system converts traditional long utterance turns into micro-turn interactions using special control tokens to coordinate turn-taking and response timing.
AINeutralarXiv โ CS AI ยท Mar 116/10
๐ง Researchers introduce a new framework showing that emotional tone in text systematically affects how large language models process and reason over information. They developed AURA-QA, an emotionally balanced dataset, and proposed emotional regularization techniques that improve reading comprehension performance across multiple benchmarks.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers propose TaSR-RAG, a new framework that improves Retrieval-Augmented Generation systems by using taxonomy-guided structured reasoning for better evidence selection. The system decomposes complex questions into triple sub-queries and performs step-wise evidence matching, achieving up to 14% performance improvements over existing RAG baselines on multi-hop question answering benchmarks.
AIBearisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers have identified a critical flaw in Large Language Models (LLMs) where they prioritize moral reasoning over commonsense understanding, struggling to detect logical contradictions within moral dilemmas. The study introduces the CoMoral benchmark and reveals a 'narrative focus bias' where LLMs better identify contradictions attributed to secondary characters rather than primary narrators.
AINeutralarXiv โ CS AI ยท Mar 116/10
๐ง Researchers propose MM-tau-pยฒ, a new benchmark for evaluating multi-modal AI agents that adapt to user personas in customer service settings. The framework introduces 12 novel metrics to assess robustness and performance of LLM-based agents using voice and visual inputs, showing limitations even in advanced models like GPT-4 and GPT-5.
๐ง GPT-4๐ง GPT-5
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers propose MSSR (Memory-Inspired Sampler and Scheduler Replay), a new framework for continual fine-tuning of large language models that mitigates catastrophic forgetting while maintaining adaptability. The method estimates sample-level memory strength and schedules rehearsal at adaptive intervals, showing superior performance across three backbone models and 11 sequential tasks compared to existing replay-based strategies.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers have developed neural debuggers - AI models that can emulate traditional Python debuggers by stepping through code execution, setting breakpoints, and predicting both forward and backward program states. This breakthrough enables more interactive control over neural code interpretation compared to existing approaches that only execute programs linearly.
๐ข Meta
AINeutralarXiv โ CS AI ยท Mar 116/10
๐ง Researchers introduced OPENXRD, a comprehensive benchmarking framework for evaluating large language models and multimodal LLMs in crystallography question answering. The study tested 74 state-of-the-art models and found that mid-sized models (7B-70B parameters) benefit most from contextual materials, while very large models often show saturation or interference.
๐ง GPT-4๐ง GPT-4.5๐ง GPT-5
AINeutralarXiv โ CS AI ยท Mar 116/10
๐ง Researchers developed an LLM-agent framework to model how media influences US-China attitudes from 2005-2025, testing three debiasing mechanisms to reduce AI model prejudices. The study found that devil's advocate agents were most effective at producing human-like opinion formation, while revealing geographic biases tied to AI models' origins.
๐ง GPT-4
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers developed an automated system using LLM-powered web research agents to generate and resolve forecasting questions at scale, creating 1,499 diverse real-world questions with 96% quality rate. The system demonstrates that more advanced AI models perform significantly better at forecasting tasks, with potential applications for improving AI evaluation benchmarks.
๐ง GPT-5๐ง Gemini
AIBullisharXiv โ CS AI ยท Mar 96/10
๐ง Researchers introduce ProEvolve, a graph-based framework that enables programmable evolution of AI agent environments for more realistic benchmarking. The system addresses current benchmark limitations by creating dynamic environments that can adapt and change, better reflecting real-world conditions where AI agents must operate.
AIBullisharXiv โ CS AI ยท Mar 96/10
๐ง Researchers developed 'Companion,' an AI system that combines drawing robots with Large Language Models to create a collaborative artistic partner. The system engages in real-time bidirectional interaction through speech and sketching, with art experts validating its ability to produce works with distinct aesthetic identity and exhibition merit.
AINeutralarXiv โ CS AI ยท Mar 96/10
๐ง Researchers introduce NGDBench, a comprehensive benchmark for evaluating neural networks' ability to work with graph databases across five domains including finance and medicine. The benchmark supports full Cypher query language capabilities and reveals significant limitations in current AI models when handling structured graph data, noise, and complex analytical tasks.