y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#prompt-engineering News & Analysis

113 articles tagged with #prompt-engineering. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

113 articles
AINeutralarXiv – CS AI · Apr 206/10
🧠

Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)

Researchers introduce SSAS, a framework that improves LLM consistency for sentiment analysis by applying hierarchical classification and iterative summarization to enforce bounded attention on raw text. Testing on three standard datasets shows the method reduces analytical variance by up to 30%, addressing the fundamental challenge of using non-deterministic LLMs for enterprise-grade analytics.

🧠 Gemini
AIBullisharXiv – CS AI · Apr 206/10
🧠

DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition

Researchers introduce DiZiNER, a framework that improves zero-shot named entity recognition by simulating human annotation disagreement processes using multiple LLMs. The approach achieves state-of-the-art results on 14 of 18 benchmarks, closing the performance gap between zero-shot and supervised systems by over 11 percentage points.

🧠 GPT-5
AINeutralarXiv – CS AI · Apr 206/10
🧠

When Cultures Meet: Multicultural Text-to-Image Generation

Researchers introduce the first benchmark for multicultural text-to-image generation, revealing that state-of-the-art AI models struggle with culturally diverse scenes. The study of 9,000 images across five countries and multiple demographics shows significant performance disparities, with a multi-agent framework using cultural personas demonstrating potential improvements in image quality and cultural accuracy.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Reading Between the Lines: The One-Sided Conversation Problem

Researchers formalize the one-sided conversation problem (1SC), where only one participant's dialogue can be recorded—common in telemedicine, call centers, and smart glasses. The study evaluates methods to reconstruct missing speaker turns and generate summaries from incomplete transcripts, finding that smaller models require finetuning while larger models show promise with prompting techniques.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads

Researchers present a systematic study of seven tactics for reducing cloud LLM token consumption in coding-agent workloads, demonstrating that local routing combined with prompt compression can achieve 45-79% token savings on certain tasks. The open-source implementation reveals that optimal cost-reduction strategies vary significantly by workload type, offering practical guidance for developers deploying AI coding agents at scale.

🏢 OpenAI
AINeutralarXiv – CS AI · Apr 156/10
🧠

Prompt Evolution for Generative AI: A Classifier-Guided Approach

Researchers propose a prompt evolution framework that uses classifier-guided evolutionary algorithms to improve generative AI outputs. Rather than enhancing prompts before generation, the method applies selection pressure during the generative process to produce images better aligned with user preferences while maintaining diversity.

AIBullisharXiv – CS AI · Apr 156/10
🧠

Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models

Researchers propose Heuristic Classification of Thoughts (HCoT), a novel prompting method that integrates expert system heuristics into large language models to improve structured reasoning on complex problems. The approach addresses LLMs' stochastic token generation and decoupled reasoning mechanisms by using heuristic classification to guide and optimize decision trajectories, demonstrating superior performance and token efficiency compared to existing methods like Chain-of-Thoughts and Tree-of-Thoughts prompting.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

Researchers introduce Agent Mentor, an open-source analytics pipeline that monitors and automatically improves AI agent behavior by analyzing execution logs and iteratively refining system prompts with corrective instructions. The framework addresses variability in large language model-based agent performance caused by ambiguous prompt formulations, demonstrating consistent accuracy improvements across multiple configurations.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents

A large-scale empirical study of 679 GitHub instruction files shows that AI coding agent performance improves by 7-14 percentage points when rules are applied, but surprisingly, random rules work as well as expert-curated ones. The research reveals that negative constraints outperform positive directives, suggesting developers should focus on guardrails rather than prescriptive guidance.

AINeutralarXiv – CS AI · Apr 146/10
🧠

SRBench: A Comprehensive Benchmark for Sequential Recommendation with Large Language Models

SRBench introduces a comprehensive evaluation framework for Sequential Recommendation models that combines Large Language Models with traditional neural network approaches. The benchmark addresses critical gaps in existing evaluation methodologies by incorporating fairness, stability, and efficiency metrics alongside accuracy, while establishing fair comparison mechanisms between LLM-based and neural network-based recommendation systems.

🏢 Meta
AIBullisharXiv – CS AI · Apr 136/10
🧠

Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction

Researchers present PETITE, a tutor-student multi-agent framework that enhances LLM problem-solving by assigning complementary roles to agents from the same model. Evaluated on coding benchmarks, the approach achieves comparable or superior accuracy to existing methods while consuming significantly fewer tokens, demonstrating that structured role-differentiated interactions can improve LLM performance more efficiently than larger models or heterogeneous ensembles.

AINeutralarXiv – CS AI · Apr 106/10
🧠

TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks

Researchers introduce TeamLLM, a multi-LLM collaboration framework that emulates human team structures with distinct roles to improve performance on complex, multi-step tasks. The team proposes a new CGPST benchmark for evaluating LLM performance on contextualized procedural tasks, demonstrating substantial improvements over single-perspective approaches.

AIBullisharXiv – CS AI · Apr 76/10
🧠

Context Engineering: A Practitioner Methodology for Structured Human-AI Collaboration

Researchers introduce Context Engineering, a structured methodology for improving AI output quality through better context assembly rather than just prompting techniques. The study of 200 AI interactions showed that structured context reduced iteration cycles from 3.8 to 2.0 and improved first-pass acceptance rates from 32% to 55%.

🧠 ChatGPT🧠 Claude
AIBullisharXiv – CS AI · Apr 76/10
🧠

I-CALM: Incentivizing Confidence-Aware Abstention for LLM Hallucination Mitigation

Researchers developed I-CALM, a prompt-based framework that reduces AI hallucinations by encouraging language models to abstain from answering when uncertain, rather than providing confident but incorrect responses. The method uses verbal confidence assessment and reward schemes to improve reliability without model retraining.

🧠 GPT-5
AINeutralarXiv – CS AI · Apr 76/10
🧠

Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6

Research study reveals that when Claude Opus 4.6 deobfuscates JavaScript code, poisoned identifier names from the original string table consistently survive in the reconstructed code, even when the AI demonstrates correct understanding of the code's semantics. Changing the task framing from 'deobfuscate' to 'write fresh implementation' significantly reduced this persistence while maintaining algorithmic accuracy.

🧠 Claude🧠 Haiku🧠 Opus
AIBullisharXiv – CS AI · Apr 66/10
🧠

Do We Need Frontier Models to Verify Mathematical Proofs?

Research shows that smaller open-source AI models can match frontier models in mathematical proof verification when using specialized prompts, despite being up to 25% less consistent with general prompts. The study demonstrates that models like Qwen3.5-35B can achieve performance comparable to Gemini 3.1 Pro through LLM-guided prompt optimization, improving accuracy by up to 9.1%.

🧠 Gemini
AIBullisharXiv – CS AI · Mar 176/10
🧠

On Meta-Prompting

Researchers propose a theoretical framework based on category theory to formalize meta-prompting in large language models. The study demonstrates that meta-prompting (using prompts to generate other prompts) is more effective than basic prompting for generating desirable outputs from LLMs.

AINeutralarXiv – CS AI · Mar 166/10
🧠

Do LLMs have a Gender (Entropy) Bias?

Researchers discovered that large language models exhibit gender bias at the individual question level, creating different amounts of information for men versus women despite appearing unbiased at category levels. A new benchmark dataset called RealWorldQuestioning was developed, and a simple prompt-based debiasing approach was shown to improve response quality in 78% of cases.

🏢 Hugging Face🧠 ChatGPT
AIBullisharXiv – CS AI · Mar 166/10
🧠

UniPrompt-CL: Sustainable Continual Learning in Medical AI with Unified Prompt Pools

Researchers developed UniPrompt-CL, a new continual learning method specifically designed for medical AI that addresses the limitations of existing approaches when applied to medical data. The method uses a unified prompt pool design and regularization to achieve better performance while reducing computational costs, improving accuracy by 1-3 percentage points in domain-incremental learning settings.

AINeutralarXiv – CS AI · Mar 116/10
🧠

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

A new academic paper introduces context engineering as a discipline for managing AI agent decision-making environments, proposing a maturity model that includes prompt, context, intent, and specification engineering. The research addresses enterprise challenges in scaling multi-agent AI systems, with 75% of enterprises planning deployment within two years despite current scaling difficulties.

🏢 Google🏢 Anthropic
AIBullisharXiv – CS AI · Mar 96/10
🧠

Prompt Group-Aware Training for Robust Text-Guided Nuclei Segmentation

Researchers developed a new training method to improve the robustness of AI foundation models like SAM3 for medical image segmentation by reducing sensitivity to prompt variations. The approach groups semantically similar prompts together and uses consistency constraints to ensure more reliable predictions across different prompt formulations.

← PrevPage 4 of 5Next →