NVIDIA’s Jim Fan Delves Into Large Language Models and Their Industry Impact - Ep. 204

For NVIDIA Senior AI Scientist Jim Fan, the video game Minecraft served as the “perfect primordial soup” for his research on open-ended AI agents. In the latest AI Podcast episode, host Noah Kravitz spoke with Fan on using large language models to create AI agents — specifically to create Voyager, an AI bot built with Chat GPT-4 that can autonomously play Minecraft. AI agents are models that “can proactively take actions and then perceive the world, see the consequences of its actions, and then improve itself,” Fan said. Many current AI agents are programmed to achieve specific objectives, such as beating a game as quickly as possible or answering a question. They can work autonomously toward a particular output but lack a broader decision-making agency. Fan wondered if it was possible to have a “truly open-ended agent that can be prompted by arbitrary natural language to do open-ended, even creative things.” But he needed a flexible playground in which to test that possibility. “And that’s why we found Minecraft to be almost a perfect primordial soup for open-ended agents to emerge, because it sets up the environment so well,” he said. Minecraft at its core, after all, doesn’t set a specific key objective for players other than to survive and freely explore the open world. That became the springboard for Fan’s project, MineDojo, which eventually led to the creation of the AI bot Voyager. “Voyager leverages the power of Chat GPT-4 to write code in Javascript to execute in the game,” Fan explained. “GPT-4 then looks at the output, and if there’s an error from JavaScript or some feedback from the environment, GPT-4 does a self-reflection and tries to debug the code.” The bot learns from its mistakes and stores the correctly implemented programs in a skill library for future use, allowing for “lifelong learning.” In-game, Voyager can autonomously explore for hours, adapting its decisions based on its environment and developing skills to combat monsters and find food when needed. “We see all these behaviors come from the Voyager setup, the skill library and also the coding mechanism,” Fan explained. “We did not preprogram any of these behaviors.” He then spoke more generally about the rise and trajectory of LLMs. He foresees strong applications in software, gaming and robotics and increasingly pressing conversations surrounding AI safety. Fan encourages those looking to get involved and work with LLMs to “just do something,” whether that means using online resources or experimenting with beginner-friendly, CPU-based AI models.

Om Podcasten