Bradley–Terry and Multi-Objective Reward Modeling Are Complementary

This research introduces SMORM, a novel framework designed to enhance reward models for Large Language Models (LLMs) by addressing the persistent issue of "reward hacking," particularly in out-of-distribution (OOD) settings. The paper highlights that current state-of-the-art methods struggle when training and testing data distributions differ. SMORM uniquely combines Bradley-Terry single-objective and multi-objective regression-based reward functions within a shared embedding space, demonstrating that these two approaches offer complementary benefits. This joint training improves the robustness of single-objective models against reward hacking and boosts the scoring performance of multi-objective models even with limited fine-grained data, ultimately allowing smaller models to outperform much larger baselines.

Om Podcasten

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.