Deep Dive into Inference Optimization for LLMs with Philip Kiely

Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI workloads. We go deep on Inference Optimization. We cover choosing a model, discuss the hype around Compound AI, choosing an Inference Engine, Optimization Techniques like Quantization and Speculative Decoding all the way down to your GPU choice.

Om Podcasten

Join Alex DeBrie and Sean Falconer in insightful and in-depth interviews with tech experts, covering software development, entrepreneurship, and technology trends. Alex is the author of The DynamoDB Book and a DynamoDB expert as well as AWS Data Hero. Sean Falconer has over 20 years of experience working in research and technology as an engineer, founder, and marketing executive. Sean is a Snowflake Data Superhero. For more on Software Huddle, visit softwarehuddle.com or contact team@softwarehuddle.com.