y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm News & Analysis

956 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

956 articles
AINeutralarXiv โ€“ CS AI ยท Mar 55/10
๐Ÿง 

Beyond the Prompt: An Empirical Study of Cursor Rules

Researchers conducted a large-scale empirical study analyzing 401 open-source repositories to understand how developers use cursor rules - persistent, machine-readable directives that provide context to AI coding assistants. The study identified five key themes of project context that developers consider essential: Conventions, Guidelines, Project Information, LLM Directives, and Examples.

AINeutralarXiv โ€“ CS AI ยท Mar 45/103
๐Ÿง 

The Price of Prompting: Profiling Energy Use in Large Language Models Inference

Researchers introduce MELODI, a framework for monitoring energy consumption during large language model inference, revealing substantial disparities in energy efficiency across different deployment scenarios. The study creates a comprehensive dataset analyzing how prompt attributes like length and complexity correlate with energy expenditure, highlighting significant opportunities for optimization in LLM deployment.

AINeutralarXiv โ€“ CS AI ยท Mar 45/103
๐Ÿง 

AttackSeqBench: Benchmarking the Capabilities of LLMs for Attack Sequences Understanding

Researchers introduced AttackSeqBench, a new benchmark designed to evaluate large language models' capabilities in understanding and reasoning about cyber attack sequences from threat intelligence reports. The study tested 7 LLMs, 5 LRMs, and 4 post-training strategies to assess their ability to analyze adversarial behaviors across tactical, technical, and procedural dimensions.

AIBullisharXiv โ€“ CS AI ยท Mar 45/102
๐Ÿง 

From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation

Researchers developed a new method called activation engineering to make AI language models express more human-like emotions in conversations. The technique uses targeted interventions on LLaMA 3.1-8B to enhance emotional characteristics like positive sentiment and personal engagement without extensive fine-tuning.

AINeutralarXiv โ€“ CS AI ยท Mar 45/103
๐Ÿง 

See and Remember: A Multimodal Agent for Web Traversal

Researchers developed V-GEMS, a new multimodal AI agent architecture that improves web navigation by combining visual grounding with explicit memory systems. The system achieved a 28.7% performance improvement over existing baselines by preventing navigation loops and enabling better backtracking through structured path mapping.

AINeutralarXiv โ€“ CS AI ยท Mar 45/103
๐Ÿง 

FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

Researchers have developed FinTexTS, a new large-scale dataset that pairs financial news with stock price data using semantic matching and multi-level categorization. The framework uses embedding-based matching and LLMs to classify news into four levels (macro, sector, related company, and target company) for improved stock price forecasting accuracy.

AINeutralarXiv โ€“ CS AI ยท Mar 45/103
๐Ÿง 

ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

Researchers propose ShipTraj-R1, a novel LLM-based framework using group relative policy optimization (GRPO) for ship trajectory prediction. The system reformulates trajectory prediction as a text-to-text generation problem and demonstrates superior performance compared to existing deep learning baselines on real-world maritime datasets.

AINeutralarXiv โ€“ CS AI ยท Mar 45/102
๐Ÿง 

Eliciting Numerical Predictive Distributions of LLMs Without Autoregression

Researchers developed a method to extract numerical prediction distributions from Large Language Models without costly autoregressive sampling by training probes on internal representations. The approach can predict statistical functionals like mean and quantiles directly from LLM embeddings, potentially offering a more efficient alternative for uncertainty-aware numerical predictions.

AIBullishGoogle AI Blog ยท Mar 36/10
๐Ÿง 

Gemini 3.1 Flash-Lite: Built for intelligence at scale

Google announces Gemini 3.1 Flash-Lite, positioning it as the fastest and most cost-efficient model in their Gemini 3 series. This release focuses on optimizing AI model performance while reducing operational costs for large-scale deployments.

Gemini 3.1 Flash-Lite: Built for intelligence at scale
๐Ÿง  Gemini
AIBearisharXiv โ€“ CS AI ยท Mar 36/106
๐Ÿง 

Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

Research reveals that leading foundation models (LLMs) perform poorly on real-world educational tasks despite excelling on AI benchmarks. The study found that 50% of misalignment errors are shared across models due to common pretraining approaches, with model ensembles actually worsening performance on learning outcomes.

AIBullisharXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

PARCER as an Operational Contract to Reduce Variance, Cost, and Risk in LLM Systems

Researchers propose PARCER, a new framework that acts as an operational contract to address major governance challenges in Large Language Model systems. The framework uses structured YAML configurations to reduce variance, improve cost control, and enhance predictability in LLM operations through seven operational phases and decision hygiene practices.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files

Researchers have developed ContextCov, a framework that converts passive natural language instructions for AI agents into active, executable guardrails to prevent code violations. The system addresses 'Context Drift' where AI agents deviate from project guidelines, creating automated compliance checks across static code analysis, runtime commands, and architectural validation.

$COMP
AINeutralarXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

Researchers have developed a new preference learning framework that addresses bias in AI alignment by ensuring policies reflect true population distributions rather than just majority opinions. The approach uses social choice theory principles and has been validated on both recommendation tasks and large language model alignment.

AINeutralarXiv โ€“ CS AI ยท Mar 36/109
๐Ÿง 

EmCoop: A Framework and Benchmark for Embodied Cooperation Among LLM Agents

Researchers introduce EmCoop, a new benchmark framework for studying cooperation among LLM-based embodied multi-agent systems in dynamic environments. The framework separates cognitive coordination from physical interaction layers and provides process-level metrics to analyze collaboration quality beyond just task completion success.

AIBullisharXiv โ€“ CS AI ยท Mar 36/108
๐Ÿง 

MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Retrieval

Researchers have developed MED-COPILOT, an AI-powered clinical decision-support system that combines GraphRAG retrieval with similar patient case analysis to assist healthcare professionals. The system uses structured knowledge graphs from WHO and NICE guidelines along with a 36,000-case patient database to outperform standard AI models in clinical reasoning accuracy.

AIBullisharXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

LOGIGEN: Logic-Driven Generation of Verifiable Agentic Tasks

Researchers introduce LOGIGEN, a logic-driven framework that synthesizes verifiable training data for autonomous AI agents operating in complex environments. The system uses a triple-agent orchestration approach and achieved a 79.5% success rate on benchmarks, nearly doubling the base model's 40.7% performance.

AIBullisharXiv โ€“ CS AI ยท Mar 36/109
๐Ÿง 

TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces

Researchers introduce TraceSIR, a multi-agent framework that analyzes execution traces from AI agentic systems to diagnose failures and optimize performance. The system uses three specialized agents to compress traces, identify issues, and generate comprehensive analysis reports, significantly outperforming existing approaches in evaluation tests.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

LiTS: A Modular Framework for LLM Tree Search

LiTS is a new modular Python framework that enables LLM reasoning through tree search algorithms like MCTS and BFS. The framework demonstrates reusable components across different domains and reveals that LLM policy diversity, not reward quality, is the key bottleneck for effective tree search in infinite action spaces.

AIBullisharXiv โ€“ CS AI ยท Mar 36/108
๐Ÿง 

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Researchers introduce InfoPO (Information-Driven Policy Optimization), a new method that improves AI agent interactions by using information-gain rewards to identify valuable conversation turns. The approach addresses credit assignment problems in multi-turn interactions and outperforms existing baselines across diverse tasks including intent clarification and collaborative coding.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

BioProAgent: Neuro-Symbolic Grounding for Constrained Scientific Planning

Researchers developed BioProAgent, a neuro-symbolic AI framework that combines large language models with deterministic constraints to enable reliable scientific planning in wet-lab environments. The system achieves 95.6% physical compliance compared to 21.0% for existing methods by using finite state machines to prevent costly experimental failures.

AIBullisharXiv โ€“ CS AI ยท Mar 37/109
๐Ÿง 

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

Researchers introduce HiMAC, a hierarchical reinforcement learning framework that improves LLM agent performance on long-horizon tasks by separating macro-level planning from micro-level execution. The approach demonstrates state-of-the-art results across multiple environments, showing that structured hierarchy is more effective than simply scaling model size for complex agent tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 36/108
๐Ÿง 

CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

Researchers propose CollabEval, a new multi-agent framework for evaluating AI-generated content that uses collaborative judgment instead of single LLM evaluation. The system implements a three-phase process with multiple AI agents working together to provide more consistent and less biased evaluations than current approaches.

AIBullisharXiv โ€“ CS AI ยท Mar 36/109
๐Ÿง 

Alien Science: Sampling Coherent but Cognitively Unavailable Research Directions from Idea Atoms

Researchers developed a method to generate 'alien' research directions by decomposing academic papers into 'idea atoms' and using AI models to identify coherent but non-obvious research paths. The system analyzes ~7,500 machine learning papers to find viable research directions that current researchers are unlikely to naturally propose.