y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#prompt-engineering News & Analysis

104 articles tagged with #prompt-engineering. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

104 articles
AIBullisharXiv – CS AI · May 116/10
🧠

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

Researchers introduce CA-SQL, an advanced Text-to-SQL pipeline that dynamically allocates computational resources based on task complexity to improve LLM reasoning. The method achieves state-of-the-art performance on the BIRD benchmark's challenging tier using only GPT-4o-mini, outperforming larger models and demonstrating the efficiency gains possible through intelligent inference-time optimization.

🧠 GPT-4
AINeutralarXiv – CS AI · May 116/10
🧠

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Researchers present CWE-BENCH-PYTHON, a large-scale benchmark demonstrating that poorly formulated prompts significantly increase the likelihood of LLMs generating insecure code. The study shows advanced prompting techniques like Chain-of-Thought can effectively mitigate these security risks, establishing prompt quality as a critical factor in AI-generated code safety.

AINeutralarXiv – CS AI · May 96/10
🧠

Prober.ai: Gated Inquiry-Based Feedback via LLM-Constrained Personas for Argumentative Writing Development

Prober.ai is an LLM-powered web-based writing environment that uses constrained AI personas and gated feedback mechanisms to improve argumentative writing through inquiry-based questioning rather than text generation. The system addresses cognitive outsourcing in education by forcing student reflection before revealing revision suggestions, grounded in Toulmin's argumentation theory and peer feedback research.

🧠 Gemini
AINeutralarXiv – CS AI · May 96/10
🧠

Taklif.AI: LLM-Powered Platform for Interest-Based Personalized College Assignments

Taklif.AI is an LLM-powered educational platform that generates personalized college assignments based on students' interests and cultural contexts rather than just academic performance metrics. The system uses Llama 3.3 70B with AWS serverless architecture and achieved 84% positive reception in preliminary testing with 68 participants.

🧠 Llama
AINeutralarXiv – CS AI · May 96/10
🧠

Visual Fingerprints for LLM Generation Comparison

Researchers have developed a visual fingerprinting method to compare Large Language Model outputs across different generation conditions by analyzing linguistic choices in content, expression, and structure. This approach enables pattern recognition in LLM behavior that is difficult to detect through individual responses or standard metrics, advancing model evaluation and prompt optimization techniques.

AINeutralarXiv – CS AI · May 96/10
🧠

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

Researchers introduce MASPO, a framework that automatically optimizes prompts across multi-agent LLM systems by evaluating how well each agent's outputs enable downstream success rather than in isolation. The approach uses evolutionary beam search to navigate prompt spaces and achieves 2.9% average accuracy improvements over existing methods across six diverse tasks.

AIBullisharXiv – CS AI · May 96/10
🧠

Mise en Place for Agentic Coding: Deliberate Preparation as Context Engineering Methodology

Researchers propose 'mise en place' (MEP), a three-phase preparation methodology for AI coding agents that emphasizes contextual grounding, collaborative specification, and task decomposition before implementation. The approach counters prevalent 'vibe coding' practices by demonstrating that deliberate preparation reduces debugging overhead and enables efficient parallel agent execution, validated through a hackathon case study.

AIBullisharXiv – CS AI · May 96/10
🧠

Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs

Researchers introduce Memory Inception (MI), a training-free method for steering large language models by inserting text-derived key-value banks at selected attention layers rather than caching full prompts. MI achieves competitive control with instruction prompting while using up to 118x less storage and outperforms existing activation steering methods on personality, reasoning, and guidance tasks.

AIBullisharXiv – CS AI · May 76/10
🧠

RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation

RaguTeam won SemEval-2026 Task 8 using a seven-model LLM ensemble with a GPT-4o-mini judge selector, achieving a conditioned harmonic mean of 0.7827 and significantly outperforming the baseline. The research demonstrates that model diversity across families, scales, and prompting strategies drives superior performance in multi-turn response generation tasks.

🧠 GPT-4
AINeutralarXiv – CS AI · May 76/10
🧠

SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models

Researchers introduce SafeRedir, an inference-time framework that safely redirects unsafe prompts in image generation models by rerouting them toward benign semantic regions without modifying underlying model weights. The lightweight approach uses token-level embedding interventions to mitigate generation of NSFW content and copyrighted styles while maintaining image quality and resisting adversarial attacks.

AINeutralarXiv – CS AI · May 46/10
🧠

How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses

Researchers propose NDBench, a benchmark framework testing how frontier LLMs adapt outputs when given neurodivergence context in system prompts. The study finds that LLMs increase structural complexity (headings, steps, length) under explicit ND instructions, but persona assertion alone fails to suppress harmful behaviors—a critical finding for equitable AI system design.

AIBearisharXiv – CS AI · May 46/10
🧠

Impact of Task Phrasing on Presumptions in Large Language Models

Researchers at arXiv studied how task phrasing influences the decision-making of large language models, using the iterated prisoner's dilemma as a test case. The findings reveal that LLMs are prone to making presumptions based on how tasks are worded, which can impair their adaptability and reasoning—a safety concern for real-world deployment. Neutral task phrasing significantly reduced these presumptions, suggesting that prompt design is critical for reliable LLM performance.

AINeutralarXiv – CS AI · May 16/10
🧠

In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks

Research demonstrates that for procedural tasks, simple in-context prompting with complete procedures in the system prompt outperforms complex agent orchestration frameworks like LangGraph and CrewAI. Testing across three domains showed the simpler approach achieved 4.53-5.00 quality scores versus 4.17-4.84 for orchestrated systems, with failure rates 50-76% lower, suggesting advances in frontier LLM capabilities have eliminated the need for external orchestration.

🏢 OpenAI
AIBullisharXiv – CS AI · May 16/10
🧠

LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning

Researchers present LLM+ASP, a framework combining large language models with Answer Set Programming to enable nonmonotonic reasoning without task-specific engineering. The system uses automated self-correction loops where an ASP solver provides structured feedback, demonstrating significant performance improvements over monotonic logic approaches across diverse reasoning benchmarks.

AINeutralarXiv – CS AI · May 16/10
🧠

Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

Researchers propose Comet-H, an AI system that orchestrates language models to generate research software by keeping mathematical theory, code, benchmarks, and documentation synchronized. The framework addresses hallucination and desynchronization failures in LLM-driven development, demonstrating effectiveness through a portfolio of 46 research repositories, with a static-analysis tool reaching F1=0.768 performance.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Facial-Expression-Aware Prompting for Empathetic LLM Tutoring

Researchers demonstrate that integrating facial expression analysis into large language model prompts improves empathetic tutoring responses without requiring model retraining. Testing across three major LLM backbones with 960 multi-turn conversations, Action Unit estimation-based conditioning consistently enhanced emotional responsiveness while maintaining pedagogical quality.

🧠 GPT-5🧠 Claude🧠 Gemini
AINeutralarXiv – CS AI · Apr 206/10
🧠

Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)

Researchers introduce SSAS, a framework that improves LLM consistency for sentiment analysis by applying hierarchical classification and iterative summarization to enforce bounded attention on raw text. Testing on three standard datasets shows the method reduces analytical variance by up to 30%, addressing the fundamental challenge of using non-deterministic LLMs for enterprise-grade analytics.

🧠 Gemini
AIBullisharXiv – CS AI · Apr 206/10
🧠

DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition

Researchers introduce DiZiNER, a framework that improves zero-shot named entity recognition by simulating human annotation disagreement processes using multiple LLMs. The approach achieves state-of-the-art results on 14 of 18 benchmarks, closing the performance gap between zero-shot and supervised systems by over 11 percentage points.

🧠 GPT-5
AINeutralarXiv – CS AI · Apr 206/10
🧠

When Cultures Meet: Multicultural Text-to-Image Generation

Researchers introduce the first benchmark for multicultural text-to-image generation, revealing that state-of-the-art AI models struggle with culturally diverse scenes. The study of 9,000 images across five countries and multiple demographics shows significant performance disparities, with a multi-agent framework using cultural personas demonstrating potential improvements in image quality and cultural accuracy.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Reading Between the Lines: The One-Sided Conversation Problem

Researchers formalize the one-sided conversation problem (1SC), where only one participant's dialogue can be recorded—common in telemedicine, call centers, and smart glasses. The study evaluates methods to reconstruct missing speaker turns and generate summaries from incomplete transcripts, finding that smaller models require finetuning while larger models show promise with prompting techniques.

AIBullisharXiv – CS AI · Apr 156/10
🧠

Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models

Researchers propose Heuristic Classification of Thoughts (HCoT), a novel prompting method that integrates expert system heuristics into large language models to improve structured reasoning on complex problems. The approach addresses LLMs' stochastic token generation and decoupled reasoning mechanisms by using heuristic classification to guide and optimize decision trajectories, demonstrating superior performance and token efficiency compared to existing methods like Chain-of-Thoughts and Tree-of-Thoughts prompting.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads

Researchers present a systematic study of seven tactics for reducing cloud LLM token consumption in coding-agent workloads, demonstrating that local routing combined with prompt compression can achieve 45-79% token savings on certain tasks. The open-source implementation reveals that optimal cost-reduction strategies vary significantly by workload type, offering practical guidance for developers deploying AI coding agents at scale.

🏢 OpenAI
AINeutralarXiv – CS AI · Apr 156/10
🧠

Prompt Evolution for Generative AI: A Classifier-Guided Approach

Researchers propose a prompt evolution framework that uses classifier-guided evolutionary algorithms to improve generative AI outputs. Rather than enhancing prompts before generation, the method applies selection pressure during the generative process to produce images better aligned with user preferences while maintaining diversity.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

Researchers introduce Agent Mentor, an open-source analytics pipeline that monitors and automatically improves AI agent behavior by analyzing execution logs and iteratively refining system prompts with corrective instructions. The framework addresses variability in large language model-based agent performance caused by ambiguous prompt formulations, demonstrating consistent accuracy improvements across multiple configurations.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents

A large-scale empirical study of 679 GitHub instruction files shows that AI coding agent performance improves by 7-14 percentage points when rules are applied, but surprisingly, random rules work as well as expert-curated ones. The research reveals that negative constraints outperform positive directives, suggesting developers should focus on guardrails rather than prescriptive guidance.

← PrevPage 3 of 5Next →