y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm News & Analysis

956 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

956 articles
AINeutralarXiv โ€“ CS AI ยท Feb 276/106
๐Ÿง 

Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs

Researchers created a 4.5k text corpus analyzing how different AI personas, including Microsoft's controversial Sydney chatbot, express views on human-AI relationships across 12 major language models. The study examines how the Sydney persona has spread memetically through training data, allowing newer models to simulate its distinctive characteristics and perspectives.

AIBullisharXiv โ€“ CS AI ยท Feb 276/107
๐Ÿง 

Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences

Researchers introduce Duel-Evolve, a new optimization algorithm that improves LLM performance at test time without requiring external rewards or labels. The method uses self-generated pairwise comparisons and achieved 20 percentage points higher accuracy on MathBench and 12 percentage points improvement on LiveCodeBench.

AIBullisharXiv โ€“ CS AI ยท Feb 276/106
๐Ÿง 

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Researchers have developed LLM4Cov, an offline learning framework that enables AI agents to generate high-coverage hardware verification testbenches without expensive online reinforcement learning. A compact 4B-parameter model achieved 69.2% coverage pass rate, outperforming larger models by demonstrating efficient learning from execution feedback in hardware verification tasks.

AINeutralarXiv โ€“ CS AI ยท Feb 276/103
๐Ÿง 

CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

Researchers developed CXReasonAgent, a diagnostic AI agent that combines large language models with clinical diagnostic tools to provide evidence-based chest X-ray analysis. The system addresses limitations of current vision-language models that generate plausible but ungrounded medical diagnoses, introducing a new benchmark with 1,946 diagnostic dialogues.

AINeutralarXiv โ€“ CS AI ยท Feb 276/107
๐Ÿง 

SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy

Researchers have developed SPM-Bench, a PhD-level benchmark for testing large language models on scanning probe microscopy tasks. The benchmark uses automated data synthesis from scientific papers and introduces new evaluation metrics to assess AI reasoning capabilities in specialized scientific domains.

AIBullisharXiv โ€“ CS AI ยท Feb 276/106
๐Ÿง 

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

Apple's App Store search team successfully implemented LLM-generated textual relevance labels to augment their ranking system, addressing data scarcity issues. A fine-tuned specialized model outperformed larger pre-trained models, generating millions of labels that improved search relevance. This resulted in a statistically significant 0.24% increase in conversion rates in worldwide A/B testing.

AIBullisharXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

Utilizing LLMs for Industrial Process Automation

This research explores the application of Large Language Models (LLMs) to industrial process automation, focusing on specialized programming languages used in manufacturing contexts. Unlike previous work that concentrated on general-purpose languages like Python, this study aims to integrate LLMs into industrial development workflows to solve real-world automation tasks such as robotic arm programming.

AIBearisharXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

Moral Preferences of LLMs Under Directed Contextual Influence

A new research study reveals that Large Language Models' moral decision-making can be significantly influenced by contextual cues in prompts, even when the models claim neutrality. The research shows that LLMs exhibit systematic bias when given directed contextual influences in moral dilemma scenarios, challenging assumptions about AI moral consistency.

AIBullisharXiv โ€“ CS AI ยท Feb 276/107
๐Ÿง 

Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset

Researchers released the Asta Interaction Dataset containing over 200,000 user queries from AI-powered scientific research tools, revealing how scientists interact with LLM-based research assistants. The study shows users treat these systems as collaborative research partners, submitting longer queries and using outputs as persistent artifacts for non-linear exploration.

AIBullisharXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

MoDora: Tree-Based Semi-Structured Document Analysis System

Researchers introduce MoDora, an AI-powered system that uses tree-based analysis to understand and answer questions about semi-structured documents containing mixed data elements like tables, charts, and text. The system addresses challenges in processing fragmented OCR data and hierarchical document structures, achieving 5.97%-61.07% accuracy improvements over existing baselines.

AINeutralarXiv โ€“ CS AI ยท Feb 276/107
๐Ÿง 

Probing for Knowledge Attribution in Large Language Models

Researchers developed a method to identify whether large language model outputs come from user prompts or internal training data, addressing the problem of AI hallucinations. Their linear classifier probe achieved up to 96% accuracy in determining knowledge sources, with attribution mismatches increasing error rates by up to 70%.

$LINK
AIBullisharXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

TCM-DiffRAG: Personalized Syndrome Differentiation Reasoning Method for Traditional Chinese Medicine based on Knowledge Graph and Chain of Thought

Researchers developed TCM-DiffRAG, a novel AI framework that combines knowledge graphs with chain-of-thought reasoning to improve large language models' performance in Traditional Chinese Medicine diagnosis. The system significantly outperformed standard LLMs and other RAG methods in personalized medical reasoning tasks.

AIBullisharXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

dLLM: Simple Diffusion Language Modeling

Researchers introduce dLLM, an open-source framework that unifies core components of diffusion language modeling including training, inference, and evaluation. The framework enables users to reproduce, finetune, and deploy large diffusion language models like LLaDA and Dream while providing tools to build smaller models from scratch with accessible compute resources.

AIBullisharXiv โ€“ CS AI ยท Feb 275/107
๐Ÿง 

Addressing Climate Action Misperceptions with Generative AI

A study of 1,201 climate-concerned individuals found that personalized AI conversations using climate-equipped large language models significantly improved understanding of climate action impacts and increased intentions to adopt high-impact behaviors. The personalized climate LLM outperformed web searches, unspecialized LLMs, and control groups in motivating behavior change through tailored guidance.

AIBullisharXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue

Researchers introduce InteractCS-RL, a new reinforcement learning framework that helps AI agents balance empathetic communication with cost-effective decision-making in task-oriented dialogue. The system uses a multi-granularity approach with persona-driven user interactions and cost-aware policy optimization to achieve better performance across business scenarios.

AIBullisharXiv โ€“ CS AI ยท Feb 276/108
๐Ÿง 

Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation

Researchers developed GYWI, a scientific idea generation system that combines author knowledge graphs with retrieval-augmented generation to help Large Language Models generate more controllable and traceable scientific ideas. The system significantly outperforms mainstream LLMs including GPT-4o, DeepSeek-V3, Qwen3-8B, and Gemini 2.5 in metrics like novelty, reliability, and relevance.

AIBullisharXiv โ€“ CS AI ยท Feb 276/106
๐Ÿง 

Automating the Detection of Requirement Dependencies Using Large Language Models

Researchers developed LEREDD, an LLM-based system that automates the detection of dependencies between software requirements using Retrieval-Augmented Generation and In-Context Learning. The system achieved 93% accuracy in classifying requirement dependencies, significantly outperforming existing baselines with relative gains of over 94% in F1 scores for specific dependency types.

AINeutralarXiv โ€“ CS AI ยท Feb 276/106
๐Ÿง 

Tokenization, Fusion and Decoupling: Bridging the Granularity Mismatch Between Large Language Models and Knowledge Graphs

Researchers propose KGT, a novel framework that bridges the gap between Large Language Models and Knowledge Graph Completion by using dedicated entity tokens for full-space prediction. The approach addresses fundamental granularity mismatches through specialized tokenization, feature fusion, and decoupled prediction mechanisms.

AIBullisharXiv โ€“ CS AI ยท Feb 276/106
๐Ÿง 

Reinforcement-aware Knowledge Distillation for LLM Reasoning

Researchers propose RL-aware distillation (RLAD), a new method to efficiently transfer knowledge from large language models to smaller ones during reinforcement learning training. The approach uses Trust Region Ratio Distillation (TRRD) to selectively guide student models only when it improves policy updates, outperforming existing distillation methods across reasoning benchmarks.

AIBullisharXiv โ€“ CS AI ยท Feb 275/107
๐Ÿง 

Decoder-based Sense Knowledge Distillation

Researchers have developed Decoder-based Sense Knowledge Distillation (DSKD), a new framework that integrates lexical resources into decoder-style large language models during training. The method enhances knowledge distillation performance while enabling generative models to inherit structured semantics without requiring dictionary lookup during inference.

AINeutralarXiv โ€“ CS AI ยท Feb 275/105
๐Ÿง 

CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines

Researchers propose Contrastive World Models (CWM), a new approach for training AI agents to better distinguish between physically feasible and infeasible actions in embodied environments. The method uses contrastive learning with hard negative examples to outperform traditional supervised fine-tuning, achieving 6.76 percentage point improvement in precision and better safety margins under stress conditions.

AINeutralarXiv โ€“ CS AI ยท Feb 276/106
๐Ÿง 

Unmasking Reasoning Processes: A Process-aware Benchmark for Evaluating Structural Mathematical Reasoning in LLMs

Researchers introduced ReasoningMath-Plus, a new benchmark with 150 problems designed to evaluate structural mathematical reasoning in large language models. The study reveals that while leading LLMs achieve relatively high final-answer accuracy, they perform significantly worse on process-level evaluation metrics, indicating that answer-only assessments may overestimate actual reasoning capabilities.

$NEAR
AINeutralarXiv โ€“ CS AI ยท Feb 275/107
๐Ÿง 

Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

Researchers introduced Conditioned Comment Prediction (CCP) to evaluate how well Large Language Models can simulate social media user behavior by predicting user comments. The study found that supervised fine-tuning improves text structure but degrades semantic accuracy, and that behavioral histories are more effective than descriptive personas for user simulation.

AIBullisharXiv โ€“ CS AI ยท Feb 275/107
๐Ÿง 

EyeLayer: Integrating Human Attention Patterns into LLM-Based Code Summarization

Researchers developed EyeLayer, a module that integrates human eye-tracking patterns into large language models to improve code summarization. The system achieved up to 13.17% improvement on BLEU-4 metrics by using human gaze data to guide AI attention mechanisms.