Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

This paper introduces Multi-Objective Preference Optimization (MOPO), a novel algorithm designed to align large language models with complex human preferences that involve multiple, potentially conflicting goals like helpfulness and harmlessness. Unlike prior methods that often reduce multi-objective alignment to a single score, MOPO frames the problem as a constrained optimization, maximizing a primary objective while ensuring secondary objectives meet certain thresholds. The paper demonstrates through synthetic and real-world experiments that MOPO effectively approximates the Pareto front—the set of optimal trade-offs between objectives—and outperforms existing techniques in achieving a better balance across various preference dimensions, while also showing robustness to different settings.

Om Podcasten

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.