956 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv โ CS AI ยท Mar 55/10
๐ง Researchers conducted a large-scale empirical study analyzing 401 open-source repositories to understand how developers use cursor rules - persistent, machine-readable directives that provide context to AI coding assistants. The study identified five key themes of project context that developers consider essential: Conventions, Guidelines, Project Information, LLM Directives, and Examples.
AINeutralarXiv โ CS AI ยท Mar 45/103
๐ง Researchers introduce MELODI, a framework for monitoring energy consumption during large language model inference, revealing substantial disparities in energy efficiency across different deployment scenarios. The study creates a comprehensive dataset analyzing how prompt attributes like length and complexity correlate with energy expenditure, highlighting significant opportunities for optimization in LLM deployment.
AIBullisharXiv โ CS AI ยท Mar 45/103
๐ง Researchers developed a new AI system combining Knowledge Graphs and Large Language Models to improve legal article recommendations for Chinese criminal law cases. The system achieved significant accuracy improvements, increasing from 0.549 to 0.694 in recommending relevant law articles for judicial decisions.
AINeutralarXiv โ CS AI ยท Mar 45/103
๐ง Researchers introduced AttackSeqBench, a new benchmark designed to evaluate large language models' capabilities in understanding and reasoning about cyber attack sequences from threat intelligence reports. The study tested 7 LLMs, 5 LRMs, and 4 post-training strategies to assess their ability to analyze adversarial behaviors across tactical, technical, and procedural dimensions.
AIBullisharXiv โ CS AI ยท Mar 45/102
๐ง Researchers developed a new method called activation engineering to make AI language models express more human-like emotions in conversations. The technique uses targeted interventions on LLaMA 3.1-8B to enhance emotional characteristics like positive sentiment and personal engagement without extensive fine-tuning.
AINeutralarXiv โ CS AI ยท Mar 45/103
๐ง Researchers developed V-GEMS, a new multimodal AI agent architecture that improves web navigation by combining visual grounding with explicit memory systems. The system achieved a 28.7% performance improvement over existing baselines by preventing navigation loops and enabling better backtracking through structured path mapping.
AINeutralarXiv โ CS AI ยท Mar 45/103
๐ง Researchers have developed FinTexTS, a new large-scale dataset that pairs financial news with stock price data using semantic matching and multi-level categorization. The framework uses embedding-based matching and LLMs to classify news into four levels (macro, sector, related company, and target company) for improved stock price forecasting accuracy.
AINeutralarXiv โ CS AI ยท Mar 45/103
๐ง Researchers propose ShipTraj-R1, a novel LLM-based framework using group relative policy optimization (GRPO) for ship trajectory prediction. The system reformulates trajectory prediction as a text-to-text generation problem and demonstrates superior performance compared to existing deep learning baselines on real-world maritime datasets.
AINeutralarXiv โ CS AI ยท Mar 45/102
๐ง Researchers developed a method to extract numerical prediction distributions from Large Language Models without costly autoregressive sampling by training probes on internal representations. The approach can predict statistical functionals like mean and quantiles directly from LLM embeddings, potentially offering a more efficient alternative for uncertainty-aware numerical predictions.
AIBullishGoogle AI Blog ยท Mar 36/10
๐ง Google announces Gemini 3.1 Flash-Lite, positioning it as the fastest and most cost-efficient model in their Gemini 3 series. This release focuses on optimizing AI model performance while reducing operational costs for large-scale deployments.
๐ง Gemini
AIBearisharXiv โ CS AI ยท Mar 36/106
๐ง Research reveals that leading foundation models (LLMs) perform poorly on real-world educational tasks despite excelling on AI benchmarks. The study found that 50% of misalignment errors are shared across models due to common pretraining approaches, with model ensembles actually worsening performance on learning outcomes.
AIBullisharXiv โ CS AI ยท Mar 37/108
๐ง Researchers propose PARCER, a new framework that acts as an operational contract to address major governance challenges in Large Language Model systems. The framework uses structured YAML configurations to reduce variance, improve cost control, and enhance predictability in LLM operations through seven operational phases and decision hygiene practices.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers have developed ContextCov, a framework that converts passive natural language instructions for AI agents into active, executable guardrails to prevent code violations. The system addresses 'Context Drift' where AI agents deviate from project guidelines, creating automated compliance checks across static code analysis, runtime commands, and architectural validation.
$COMP
AINeutralarXiv โ CS AI ยท Mar 36/103
๐ง Researchers have developed a new preference learning framework that addresses bias in AI alignment by ensuring policies reflect true population distributions rather than just majority opinions. The approach uses social choice theory principles and has been validated on both recommendation tasks and large language model alignment.
AIBullisharXiv โ CS AI ยท Mar 37/1011
๐ง Researchers introduce Dynamic Interaction Graph (DIG), a new framework for understanding and improving collaboration between multiple general-purpose AI agents. DIG captures emergent collaboration as a time-evolving network, making it possible to identify and correct collaboration errors in real-time for the first time.
AINeutralarXiv โ CS AI ยท Mar 36/109
๐ง Researchers introduce EmCoop, a new benchmark framework for studying cooperation among LLM-based embodied multi-agent systems in dynamic environments. The framework separates cognitive coordination from physical interaction layers and provides process-level metrics to analyze collaboration quality beyond just task completion success.
AIBullisharXiv โ CS AI ยท Mar 36/108
๐ง Researchers have developed MED-COPILOT, an AI-powered clinical decision-support system that combines GraphRAG retrieval with similar patient case analysis to assist healthcare professionals. The system uses structured knowledge graphs from WHO and NICE guidelines along with a 36,000-case patient database to outperform standard AI models in clinical reasoning accuracy.
AIBullisharXiv โ CS AI ยท Mar 37/108
๐ง Researchers introduce LOGIGEN, a logic-driven framework that synthesizes verifiable training data for autonomous AI agents operating in complex environments. The system uses a triple-agent orchestration approach and achieved a 79.5% success rate on benchmarks, nearly doubling the base model's 40.7% performance.
AIBullisharXiv โ CS AI ยท Mar 36/109
๐ง Researchers introduce TraceSIR, a multi-agent framework that analyzes execution traces from AI agentic systems to diagnose failures and optimize performance. The system uses three specialized agents to compress traces, identify issues, and generate comprehensive analysis reports, significantly outperforming existing approaches in evaluation tests.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง LiTS is a new modular Python framework that enables LLM reasoning through tree search algorithms like MCTS and BFS. The framework demonstrates reusable components across different domains and reveals that LLM policy diversity, not reward quality, is the key bottleneck for effective tree search in infinite action spaces.
AIBullisharXiv โ CS AI ยท Mar 36/108
๐ง Researchers introduce InfoPO (Information-Driven Policy Optimization), a new method that improves AI agent interactions by using information-gain rewards to identify valuable conversation turns. The approach addresses credit assignment problems in multi-turn interactions and outperforms existing baselines across diverse tasks including intent clarification and collaborative coding.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers developed BioProAgent, a neuro-symbolic AI framework that combines large language models with deterministic constraints to enable reliable scientific planning in wet-lab environments. The system achieves 95.6% physical compliance compared to 21.0% for existing methods by using finite state machines to prevent costly experimental failures.
AIBullisharXiv โ CS AI ยท Mar 37/109
๐ง Researchers introduce HiMAC, a hierarchical reinforcement learning framework that improves LLM agent performance on long-horizon tasks by separating macro-level planning from micro-level execution. The approach demonstrates state-of-the-art results across multiple environments, showing that structured hierarchy is more effective than simply scaling model size for complex agent tasks.
AIBullisharXiv โ CS AI ยท Mar 36/108
๐ง Researchers propose CollabEval, a new multi-agent framework for evaluating AI-generated content that uses collaborative judgment instead of single LLM evaluation. The system implements a three-phase process with multiple AI agents working together to provide more consistent and less biased evaluations than current approaches.
AIBullisharXiv โ CS AI ยท Mar 36/109
๐ง Researchers developed a method to generate 'alien' research directions by decomposing academic papers into 'idea atoms' and using AI models to identify coherent but non-obvious research paths. The system analyzes ~7,500 machine learning papers to find viable research directions that current researchers are unlikely to naturally propose.