Anton Teaches Packy AI | Ep 2 | Chinchilla
We're back! In Episode 2, Anton Teaches Packy about Deepmind's March 2022 paper, Training Compute-Optimal Large Language Models, or as it's more commonly known, Chinchilla. Prior to Chinchilla, the best way to improve the performance of LLMs was thought to be by scaling up the size of the model. As a result, the largest models now have over 500 billion parameters. But there are only so many GPUs in the world, and throwing compute at the problem is expensive and energy intensive. In this paper, Deepmind found that the optimal way to scale an LLM is actually by scaling size (parameters) and training (data) proportionally. Given the race for size, today's models are plenty big but need a lot more data. In this conversation, we go deep on the paper itself, but we also zoom out to talk about the politics of AI, when AGI is going to hit, where to get more data, and why AI won't take our jobs. This one gets a lot more philosophical than our first episode as we explore the implications of Chinchilla and LLMs more generally. If you enjoyed this conversation, subscribe for more. We're going to try to release one episode per week, and we want to make this the best way to get a deeper understanding of the mind-blowing progress happening in AI and what it means for everything we do as humans. LINKS: Training Compute-Optimal Large Language Models: https://arxiv.org/abs/2203.15556 chinchilla's wild implications: https://www.lesswrong.com/posts/6Fpvc... Scaling Laws for Neural Language Models (Kaplan et al): https://arxiv.org/abs/2001.08361 --- Send in a voice message: https://podcasters.spotify.com/pod/show/ageofmiracles/message