“o3” by Zach Stein-Perlman
See livestream, site, OpenAI thread, Nat McAleese thread. OpenAI announced (but isn't yet releasing) o3 and o3-mini (skipping o2 because of telecom company O2's trademark). "We plan to deploy these models early next year" (source). "o3 is powered by further scaling up RL beyond o1" (source); I don't know whether it's a new base model. o3 gets 25% on FrontierMath, smashing the previous SoTA. (These are really hard math problems.) Wow. (The dark blue bar, about 7%, is presumably one-attempt; unfortunately OpenAI didn't say what the light blue bar is, but I think it doesn't really matter and the 25% is for real.[1]) o3 also is easily SoTA on SWE-bench Verified and Codeforces. It's also easily SoTA on ARC-AGI, after doing RL on the public ARC-AGI problems + when spending $4,000 per task on inference (!).[2] OpenAI has a "new alignment strategy"; looks like Constitutional AI (and just about [...] The original text contained 4 footnotes which were omitted from this narration. The original text contained 6 images which were described by AI. --- First published: December 20th, 2024 Source: https://forum.effectivealtruism.org/posts/aNdg7ctFP9zFcowNd/o3 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.