#prompt-optimization News & Analysis

21 articles tagged with #prompt-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

21 articles

AIBullisharXiv – CS AI · 3d ago7/10

🧠

Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

Researchers propose A2X, an LLM-native service discovery system that organizes thousands of callable services into hierarchical taxonomies to solve the context-window limitation problem facing AI agents. The approach achieves 20+ point improvements in retrieval accuracy while reducing token consumption to one-ninth compared to baseline methods, enabling scalable orchestration of distributed services.

AIBullisharXiv – CS AI · 3d ago7/10

🧠

MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains

Researchers introduce MEMENTO, a framework that treats web exploration as a learning signal for AI agents operating in data-scarce domains. By combining iterative web search with dual-channel memory systems, MEMENTO achieves 25-36% performance improvements over baseline models in professional applications like sales automation and legal research without requiring additional model training.

AIBullisharXiv – CS AI · 4d ago7/10

🧠

Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement

Researchers introduce Prompt Codebooks (PCO), a new framework for automatic prompt optimization that breaks down instructions into reusable, atomic components rather than treating prompts as fixed strings. The method achieves up to 30% performance gains over baseline approaches while reducing prompt lengths by 14x, enabling more efficient and adaptive language model instruction refinement.

AINeutralarXiv – CS AI · May 17/10

🧠

Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading

A new research paper demonstrates that current LLM evaluation frameworks using static prompts across all models produce misleading rankings compared to industry practice. The study reveals that prompt optimization (PO) significantly affects model performance rankings, suggesting practitioners must optimize prompts per model for accurate comparative evaluations.

AIBullisharXiv – CS AI · May 17/10

🧠

ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era

Researchers introduce ObjectGraph (.og), a new file format designed specifically for how AI agents consume documents through retrieval rather than linear reading. The format reduces token consumption by up to 95.3% while maintaining task accuracy, addressing a fundamental architectural mismatch between traditional documents and LLM agent workflows.

AIBullisharXiv – CS AI · Apr 137/10

🧠

AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs

AlphaLab is an autonomous research system using frontier LLMs to automate experimental cycles across computational domains. Without human intervention, it explores datasets, validates frameworks, and runs large-scale experiments while accumulating domain knowledge—achieving 4.4x speedups in CUDA optimization, 22% lower validation loss in LLM pretraining, and 23-25% improvements in traffic forecasting.

🧠 GPT-5🧠 Claude🧠 Opus

AIBullisharXiv – CS AI · Mar 46/104

🧠

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

Researchers introduce MASPOB, a bandit-based framework that optimizes prompts for Multi-Agent Systems using Graph Neural Networks to handle topology-induced coupling. The system reduces search complexity from exponential to linear while achieving state-of-the-art performance across benchmarks.

AIBullisharXiv – CS AI · Feb 277/106

🧠

Hierarchical LLM-Based Multi-Agent Framework with Prompt Optimization for Multi-Robot Task Planning

Researchers developed a hierarchical multi-agent LLM framework that significantly improves multi-robot task planning by combining natural language processing with classical PDDL planners. The system uses prompt optimization and meta-learning to achieve success rates of up to 95% on compound tasks, outperforming previous state-of-the-art methods by substantial margins.

$COMP

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

Researchers introduce eXTC, a new framework combining structured prompt optimization with reinforcement learning to create interpretable text classifiers that balance performance with explainability. The system generates human-readable domain rules while maintaining inference speed through knowledge distillation, addressing a longstanding trade-off in AI transparency.

AIBullisharXiv – CS AI · 4d ago6/10

🧠

TCP-MCP: Landscape-Guided Co-Evolution of Prompts and Communication Topologies for Multi-Agent Systems

TCP-MCP introduces a co-evolution framework that simultaneously optimizes AI agent prompts and communication network topologies, achieving state-of-the-art accuracy on multiple benchmarks while reducing token consumption by up to 5.69x compared to existing multi-agent systems. The approach treats prompt design and communication structure as interdependent variables rather than independent parameters, offering a practical methodology for cost-efficient multi-agent AI system design.

AIBullisharXiv – CS AI · 4d ago6/10

🧠

MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

Researchers introduce MemTrace, a framework for debugging Large Language Model memory systems by tracing information flow through memory evolution graphs. The system identifies root causes of memory failures and uses attribution signals to automatically optimize prompts, achieving up to 7.62% performance improvements across multiple memory architectures.

AINeutralarXiv – CS AI · 5d ago6/10

🧠

PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization

Researchers introduce PICACO, a novel in-context alignment method that optimizes meta-instructions to help large language models better understand and balance multiple, often conflicting human values without fine-tuning. The approach uses total correlation optimization to improve alignment across up to 8 distinct values while reducing noise, addressing a key limitation where LLMs struggle to reconcile competing preferences in single prompts.

AINeutralarXiv – CS AI · May 126/10

🧠

EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents

Researchers introduce EGL-SCA, a framework for graph reasoning agents that jointly optimizes both natural language instructions and computational tools through structural credit assignment. The system achieves 92.0% success rate on graph reasoning benchmarks by precisely routing failures to either prompt optimization or tool synthesis, outperforming isolated improvement approaches.

AIBearisharXiv – CS AI · May 96/10

🧠

Self-Consistency Is Losing Its Edge: Diminishing Returns and Rising Costs in Modern LLMs

Researchers demonstrate that self-consistency—a technique where LLMs sample multiple reasoning paths to improve accuracy—delivers diminishing returns on modern models. Testing with Gemini 2.5 shows minimal accuracy gains (0.4-1.6%) while token costs scale linearly, suggesting the technique has become inefficient as model reliability improves.

🧠 Gemini

AINeutralarXiv – CS AI · May 46/10

🧠

Reasoning-Intensive Regression

Researchers introduce MENTAT, a novel method for reasoning-intensive regression (RiR)—extracting subtle numerical scores from text in specialized domains. The approach combines batch-reflective prompt optimization with neural ensemble learning, achieving up to 65% improvement over standard LLM prompting and fine-tuning approaches on tasks like rubric-based scoring and domain-specific retrieval.

AI × CryptoNeutralarXiv – CS AI · May 46/10

🤖

ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination

Researchers introduce ATLAS, a multi-agent framework that uses large language models for autonomous trading by combining dynamic prompt optimization with real-time market feedback. The system addresses key challenges in deploying LLMs for finance: adapting to delayed, noisy market signals and converting model outputs into executable orders.

AINeutralarXiv – CS AI · Apr 146/10

🧠

When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

Researchers demonstrate that large language models can extract predictive features from financial news with valid intermediate signals (Information Coefficient >0.15), yet these features fail to improve reinforcement learning trading agents during macroeconomic shocks. The findings reveal a critical gap between feature-level validity and downstream policy robustness, suggesting that valid signals alone cannot guarantee trading performance under distribution shifts.

AINeutralarXiv – CS AI · Mar 96/10

🧠

ContextBench: Modifying Contexts for Targeted Latent Activation

Researchers have developed ContextBench, a new benchmark for evaluating methods that generate targeted inputs to trigger specific behaviors in language models. The study introduces enhanced Evolutionary Prompt Optimization techniques that better balance effectiveness in activating AI model features while maintaining linguistic fluency.

AINeutralarXiv – CS AI · Mar 55/10

🧠

Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

Researchers present a blueprint for evaluating and optimizing multi-agent conversational shopping assistants, addressing challenges in multi-turn interactions and tightly coupled AI systems. The paper introduces evaluation rubrics and two prompt-optimization strategies including a novel Multi-Agent Multi-Turn GEPA approach for system-level optimization.

AIBullisharXiv – CS AI · Mar 36/106

🧠

Retrieval, Refinement, and Ranking for Text-to-Video Generation via Prompt Optimization and Test-Time Scaling

Researchers introduce 3R, a new RAG-based framework that optimizes prompts for text-to-video generation models without requiring model retraining. The system uses three key strategies to improve video quality: RAG-based modifier extraction, diffusion-based preference optimization, and temporal frame interpolation for better consistency.

AIBullisharXiv – CS AI · Feb 276/105

🧠

Importance of Prompt Optimisation for Error Detection in Medical Notes Using Language Models

Researchers demonstrated that prompt optimization using Genetic-Pareto (GEPA) significantly improves language models' ability to detect errors in medical notes. The technique boosted accuracy from 0.669 to 0.785 with GPT-5 and from 0.578 to 0.690 with Qwen3-32B, achieving state-of-the-art performance on medical error detection benchmarks.