OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions
Now we will have some grounding for when weird ChatGPT behaviors are intended or side-effects -- shrinking the Overton window of RLHF bugs.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/openai-rlhf-model-spec00:00 OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions02:56 Reviewing the Model Spec08:26 Where RLHF can fail OpenAI12:23 From Model Spec's to personalizationFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_027.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_029.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_033.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_034.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_041.webpFig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/model-spec/img_046.webp Get full access to Interconnects at www.interconnects.ai/subscribe