A recipe for frontier model post-training

Apple, Meta, and Nvidia all agree -- synthetic data, iterative training, human preference labels, and lots of filtering.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/frontier-model-post-training00:00 Llama 3.1 post-training and the new normal for RLHF01:18 A new standard pipeline01:45 Human preference data02:59 Scaling RLHF05:03 Synthetic data06:10 The new normal06:51 Data quality is king07:18 Apple confirms the new normalFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_018.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_020.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_031.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_033.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_035.png This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

Om Podcasten

Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories. www.interconnects.ai