Bradley–Terry and Multi-Objective Reward Modeling Are Complementary

This research introduces SMORM, a novel framework designed to enhance reward models for Large Language Models (LLMs) by addressing the persistent issue of "reward hacking," particularly in out-of-distribution (OOD) settings. The paper highlights that current state-of-the-art methods struggle when training and testing data distributions differ. SMORM uniquely combines Bradley-Terry single-objective and multi-objective regression-based reward functions within a shared embedding space, demonstrating that these two approaches offer complementary benefits. This joint training improves the robustness of single-objective models against reward hacking and boosts the scoring performance of multi-objective models even with limited fine-grained data, ultimately allowing smaller models to outperform much larger baselines.

Om Podcasten