#prompt-engineering News & Analysis

185 articles tagged with #prompt-engineering. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

185 articles

AINeutralarXiv – CS AI · Jun 96/10

🧠

Capacity, Not Format: Rethinking Structured Reasoning Failures

Researchers found that structured output formats like JSON degrade AI model performance not because of formatting itself, but because of insufficient model capacity. Models with adequate computational headroom handle JSON constraints without accuracy loss, while smaller models operating near their limits suffer 28-36 percentage point drops, a penalty that can be partially recovered by reasoning first and formatting afterward.

🧠 GPT-4🧠 Opus

AIBullisharXiv – CS AI · Jun 96/10

🧠

Evaluating Advanced Prompting on Gemini Flash for Multi-Hop Biomedical QA

Researchers evaluated Google's Gemini Flash models on the MedHopQA biomedical reasoning challenge, demonstrating that advanced prompt engineering significantly improves LLM performance in complex multi-hop question answering. A sophisticated prompt combining role-playing and chain-of-thought examples achieved a 0.720 score versus 0.565 baseline, with Gemini 2.0 Flash matching newer 2.5 Flash performance.

🧠 Gemini

AINeutralarXiv – CS AI · Jun 96/10

🧠

Symbolic Reasoning Frameworks Modulate LLM Risk Aversion in Multi-Agent Strategic Settings

Researchers demonstrate that symbolic reasoning frameworks (I-Ching, Tarot) injected as prompts into language models deployed as strategic agents significantly reshape multi-agent game outcomes by modulating risk-aversion behaviors, producing framework-specific winner distributions in a 7-player diplomacy simulation without the agents following the frameworks' literal content.

AINeutralarXiv – CS AI · Jun 95/10

🧠

Decoupling Semantics and Logic: A Training-Free Coarse-to-Fine Pipeline for Video Retrieval-Augmented Generation

Researchers present a training-free Video RAG (Retrieval-Augmented Generation) system that decouples semantic retrieval from logical reasoning to improve cross-lingual video comprehension and reduce hallucinations. The two-stage pipeline uses dense retrieval with clean visual data followed by LLM-powered cognitive reranking, achieving strong precision in information retrieval and persona-conditioned generation.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Seeing is Believing: Aligning Prompt Rewriting with Visual Anchors for Text-to-Image Generation

Researchers introduce FaithRewriter, a novel framework that enhances text-to-image generation by grounding prompt rewrites in actual visual outputs rather than linguistic improvements alone. The system uses multimodal AI to generate intermediate images from user prompts, then leverages this visual context to create more faithful augmentations that better align user intent with generated results.

AINeutralarXiv – CS AI · Jun 95/10

🧠

When Video Misreads: Closed-Loop Distillation of Reading Heuristics for Exploratory Manipulation Trace QA

Researchers introduce Closed-Loop Trace Distillation, a method to improve AI systems' ability to understand robotic manipulation failures and infer necessary action sequences. The approach uses distilled natural-language heuristics derived from training traces, enabling frozen vision-language models to achieve 38-47% accuracy improvements over baseline methods in predicting minimal-success action chains on both simulated and real robots.

AIBullisharXiv – CS AI · Jun 86/10

🧠

MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts

Researchers introduce MHA-RAG, a framework that encodes domain-specific exemplars as soft prompts instead of text, achieving 20-point performance improvements over standard RAG while reducing inference costs by 10X. The approach demonstrates order-invariant performance across multiple question-answering benchmarks, addressing key challenges in adapting foundation models to new domains with limited data.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Framing, Judging, Steering: An Assessable Competency Model for Teach-ing Students to Reason With Generative AI

Researchers propose CoRe-3, a three-part competency model for teaching students to reason effectively with generative AI by separating task framing, output evaluation, and iterative steering into distinct, measurable skills. The framework addresses a critical gap in AI education: current assessments collapse productive AI use into a single 'prompting' score, obscuring where students succeed or fail in working with AI systems.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Staying with the Uncertainty: Uncertainty-Scaffolding Strategies for Artificial Moral Advisors in LLM-to-LLM Simulated Conversations

Researchers study how Large Language Models deployed as Artificial Moral Advisors should communicate with users discussing ethical dilemmas, proposing three uncertainty-focused conversation strategies and finding that different approaches sustain distinct quality levels of engagement rather than producing uniform belief revision.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Measuring the sensitivity of LLM-based structured extraction to prompt, model, and schema choices in clinical discharge summaries

Researchers evaluated how large language models performing structured data extraction from clinical notes respond to variations in prompts, model sizes, and data schemas. The study found that schema design—particularly the distinction between absent versus undocumented information—drives disagreement more than prompt phrasing, while model choice significantly impacts multi-class categorization tasks.

AINeutralarXiv – CS AI · Jun 56/10

🧠

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Researchers identify critical failure modes in multi-objective prompt optimization for LLM judges, finding that jointly optimizing across multiple evaluation criteria reduces gradient task-focus by 59% and combining single-objective prompts degrades performance by 27%. The study reveals fundamental limitations in extending textual gradient methods to multi-criteria scenarios, constraining practical applications of automated LLM judge customization.

AIBullisharXiv – CS AI · Jun 46/10

🧠

Generalizable Multi-Task Learning for Wireless Networks Using Prompt Decision Transformers

Researchers propose Prompt Decision Transformer (PromptDT), an AI framework that improves wireless network resource management through multi-task learning, achieving up to 49% QoE improvements over conventional methods while generalizing to unseen network configurations without retraining.

AINeutralarXiv – CS AI · Jun 46/10

🧠

DAR: Deontic Reasoning with Agentic Harnesses

Researchers introduce Deontic Agentic Reasoning (DAR), a new framework that enables large language models to better tackle complex rule-based reasoning tasks by dynamically querying statutes and policies. Testing on DeonticBench shows agentic approaches improve performance on hard cases, though weaker models struggle with numerical reasoning and consume significantly more tokens.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Geometry-Aware Hallucination Detection in Large Language Models

Researchers introduce GA-ICL, a geometry-aware framework that improves hallucination detection in large language models by selecting better in-context learning demonstrations. Rather than relying on surface-level text similarity, the method uses latent representations and prototype geometry to choose demonstrations, achieving stronger performance across factual verification and hallucination detection benchmarks while maintaining robustness across model scales.