DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?

This research introduces DELTA-Code, a benchmark designed to investigate whether Large Language Models (LLMs) can genuinely acquire and generalize novel reasoning strategies beyond their pre-trained or post-trained capabilities using Reinforcement Learning (RL). The paper focuses on two main aspects: learnability, determining if RL can help LLMs solve coding problems that were previously unsolvable, and transferrability, assessing if those newly acquired skills can systematically generalize to out-of-distribution test sets. The authors report observing a "striking grokking phase transition" where RL-trained models suddenly achieve high accuracy after an extended period of near-zero success, using specific training ingredients like curriculum training and experience replay to enable this learning.

Om Podcasten