LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

This paper introduces the **Prompt Duel Optimizer (PDO)**, a novel, sample-efficient framework for **label-free prompt optimization** in large language models (LLMs). Recognizing that LLM performance is highly sensitive to input prompts and that collecting ground-truth labels is costly, PDO frames the optimization challenge as a **dueling bandit problem** where an LLM acts as a judge, providing noisy but usable **pairwise preference feedback**. PDO's effectiveness stems from two core components: **Double Thompson Sampling (D-TS)**, which intelligently prioritizes which prompt pairs to compare for efficient selection, and **Top-Performer Guided Mutation**, which periodically expands the candidate pool by generating variations of the best-performing prompts. Experimental results on datasets like BIG-bench Hard (BBH) and MS MARCO demonstrate that PDO consistently outperforms label-free baselines and can effectively mitigate judge noise by incorporating a small fraction of real labels when available.

Om Podcasten