How can AIs know what we want if *we* don't even know? (with Geoffrey Irving)

Read the full transcript here. (https://podcast.clearerthinking.org/episode/194/#transcript) • What does it really mean to align an AI system with human values? What would a powerful AI need to do in order to do "what we want"? How does being an assistant differ from being an agent? Could inter-AI debate work as an alignment strategy, or would it just result in arguments designed to manipulate humans via their cognitive and emotional biases? How can we make sure that all human values are learned by AIs, not just the values of humans in WEIRD societies? Are our current state-of-the-art LLMs politically left-leaning? How can alignment strategies take into account the fact that our individual and collective values occasionally change over time? • Geoffrey Irving is an AI safety researcher at DeepMind. Before that, he led the Reflection Team at OpenAI, was involved in neural network theorem proving at Google Brain, cofounded Eddy Systems to autocorrect code as you type, and worked on computational physics and geometry at Otherlab, D. E. Shaw Research, Pixar, and Weta Digital. He has screen credits on Ratatouille , WALL•E , Up , and Tintin . Learn more about him at his website, naml.us (https://naml.us/). • Further reading • Gandalf: An Educational Game Demonstrating Security Vulnerabilities in Large Language Models (https://gandalf.lakera.ai/) • "AI safety via debate" (https://openai.com/research/debate) • "Claude's Constitution" (https://www.anthropic.com/index/claudes-constitution) • Staff • Spencer Greenberg (https://www.spencergreenberg.com/) — Host / Director • Josh Castle (mailto:joshrcastle@gmail.com) — Producer • Ryan Kessler (https://tone.support/) — Audio Engineer • Uri Bram (https://uribram.com/) — Factotum • WeAmplify (https://www.weamplify.info/) — Transcriptionists • Music • Broke for Free (https://freemusicarchive.org/music/Broke_For_Free/Something_EP/Broke_For_Free_-_Something_EP_-_05_Something_Elated) • Josh Woodward (https://www.joshwoodward.com/song/AlreadyThere) • Lee Rosevere (https://archive.org/details/MusicForPodcasts04/Lee+Rosevere+-+Music+for+Podcasts+4+-+11+Keeping+Stuff+Together.flac) • Quiet Music for Tiny Robots (https://www.freemusicarchive.org/music/Quiet_Music_for_Tiny_Robots/The_February_Album/05_Tiny_Robot_Armies) • wowamusic (https://gamesounds.xyz/?dir=wowamusic) • zapsplat.com (https://www.zapsplat.com/music/summer-haze-slow-chill-out-house-track-with-a-modern-pop-feel-warm-piano-chords-underpin-the-track-with-warm-pads-and-a-repetitive-synth-arpeggio/) • Affiliates • Clearer Thinking (https://www.clearerthinking.org/) • GuidedTrack (https://guidedtrack.com/) • Mind Ease (https://mindease.io/) • Positly (https://positly.com/) • UpLift (https://www.uplift.app/) [Read more: https://podcast.clearerthinking.org/episode/194/geoffrey-irving-how-can-ais-know-what-we-want-if-we-don-t-even-know]

Om Podcasten

Clearer Thinking is a podcast about ideas that truly matter. If you enjoy learning about powerful, practical concepts and frameworks, wish you had more deep, intellectual conversations in your life, or are looking for non-BS self-improvement, then we think you'll love this podcast! Each week we invite a brilliant guest to bring four important ideas to discuss for an in-depth conversation. Topics include psychology, society, behavior change, philosophy, science, artificial intelligence, math, economics, self-help, mental health, and technology. We focus on ideas that can be applied right now to make your life better or to help you better understand yourself and the world, aiming to teach you the best mental tools to enhance your learning, self-improvement efforts, and decision-making. • We take on important, thorny questions like: • What's the best way to help a friend or loved one going through a difficult time? How can we make our worldviews more accurate? How can we hone the accuracy of our thinking? What are the advantages of using our "gut" to make decisions? And when should we expect careful, analytical reflection to be more effective? Why do societies sometimes collapse? And what can we do to reduce the chance that ours collapses? Why is the world today so much worse than it could be? And what can we do to make it better? What are the good and bad parts of tradition? And are there more meaningful and ethical ways of carrying out important rituals, such as honoring the dead? How can we move beyond zero-sum, adversarial negotiations and create more positive-sum interactions?