Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.🎙 Get our podcasts on these platforms:Apple Podcasts: http://wandb.me/apple-podcastsSpotify: http://wandb.me/spotifyGoogle: http://wandb.me/gd_googleYouTube: http://wandb.me/youtubeFollow Weights & Biases:https://twitter.com/weights_biases https://www.linkedin.com/company/wandb  Join the Weights & Biases Discord Server:https://discord.gg/CkZKRNnaf3

Om Podcasten

Join Lukas Biewald on Gradient Dissent, an AI-focused podcast brought to you by Weights & Biases. Dive into fascinating conversations with industry giants from NVIDIA, Meta, Google, Lyft, OpenAI, and more. Explore the cutting-edge of AI and learn the intricacies of bringing models into production.