PubNub with Stephen Blum

In this episode, we are joined by Steven, the CTO of PubNub, a company that has developed an edge net messaging network with over a billion connected devices. Steven explains that while message buses like Kafka or RabbitMQ are suitable for smaller scales, PubNub focuses on the challenges of connecting mobile devices and laptops at a web scale. They aim to provide instant signal delivery at a massive scale, prioritizing low latency for a seamless user experience. To achieve this, PubNub has architected their system to be globally distributed, running on AWS with Kubernetes clusters spread across all of Amazon's zones. They utilize GeoDNS to ensure users connect to the closest region for the lowest latency possible. Steven goes on to discuss the challenges they faced in building their system, particularly in terms of memory management and cleanup. They had to deal with issues such as segmentation faults and memory leaks, which caused runtime problems, outages, and potential data loss. PubNub had to invest in additional memory to compensate for these leaks and spend time finding and fixing the problems. While C was efficient, it came with significant engineering costs. As a solution, PubNub started adopting Rust, which helped alleviate some of these challenges. When they replaced a service with Rust, they observed a 5x improvement in memory and performance. Steven also talks about choosing programming languages for their platform and the difficulties in finding and retaining C experts. They didn't consider Java due to its perceived academic nature, and Go didn't make the list of options at the time. However, they now have services in production written in Go, though rewriting part of their PubSub bus in Go performed poorly compared to their existing C system. Despite this, they are favoring Rust as their language of choice for new services, citing its popularity and impressive results. The conversation delves into performance considerations with Python and the use of PyPy as a just-in-time compiler for optimization. While PyPy improved performance, it also required a lot of memory, which could be expensive. On the other hand, Rust provided a significant boost in both memory and performance, making it a favorable choice for PubNub. They also discuss provisioning, taking into account budget and aiming to be as close to what they need as possible. Kubernetes and auto scaling with HPAs (Horizontal Pod Autoscaling) are used to dynamically adjust resources based on usage. Integrating new services into PubNub's infrastructure involves both API-based communication and event-driven approaches. They use frameworks like Axiom for API-based communication and leverage Kafka with Protobuf for event sourcing. JSON is also utilized in some cases. Steven explains that they chose Protobuf for high-traffic topics and where stability is crucial. While the primary API for customers is JSON-based, PubNub recognizes the superior performance of Protobuf and utilizes it for certain cases, especially for shrinking down large character strings like booleans. They also discuss the advantages of compression enabled with Protobuf. The team reflects on the philosophy behind exploring Rust's potential for profit and its use in infrastructure and devices like IoT. Rust's optimization for smaller binaries is highlighted, and PubNub sees it as their top choice for reliability and performance. They mention developing a Rust SDK for customers using IoT devices. The open-source nature of Rust and its ability to integrate into projects and develop open standards are also praised. While acknowledging downsides like potential instabilities and longer compilation time, they remain impressed with Rust's capabilities. The conversation covers stability and safety in Rust, with the speaker expressing confidence in the compiler's ability to handle alpha software and packages. Relying on native primitives for concurrency in Rust adds to the speaker's confidence in the compiler's safety. The Rust ecosystem is seen as providing adequate coverage, although packages like libRDKafka, which are pre-1.0, can be challenging to set up or deploy. The speaker emphasizes simplicity in code and avoiding excessive abstractions, although they acknowledge the benefits of features like generics and traits in Rust. They suggest resources like a book by David McCloyd that focuses on learning Rust without overwhelming complexity. Expanding on knowledge sharing within the team, Stephen discusses how Rust advocates within the team have encouraged its use and the possibilities it holds for AI infrastructure platforms. They believe Rust could improve performance and reduce latency, particularly for CPU tasks in AI. They mention the adoption of Rust in the data science field, such as its use in the Parquet data format. The importance of tooling improvements, setting strict standards, and eliminating unsafe code is highlighted. The speaker expresses the desire for a linter that enforces a simplified version of Rust to enhance code readability, maintainability, and testability. They discuss the balance between functional and object-oriented programming in Rust, suggesting object-oriented programming for larger-scale code structure and functional paradigms within functions. Onboarding Rust engineers is also addressed, considering whether to prioritize candidates with prior Rust experience or train individuals skilled in another language on the job. Recognizing the shortage of Rust engineers, Stephen encourages those interested in Rust to pursue a career at PubNub, pointing to resources like their website and LinkedIn page for tutorials and videos. They emphasize the importance of latency in their edge messaging technology and invite users to try it out.

Om Podcasten

This is "Rust in Production", a podcast about companies who use Rust to shape the future of infrastructure. We follow their journey in pursuit of more reliable and efficient software as they solve some of the most challenging technical problems in the world. Each episode dives deep into real-world applications of Rust, showcasing how this powerful systems programming language is revolutionizing the way we build and maintain critical infrastructure. From startups to tech giants, we explore the diverse landscape of organizations leveraging Rust's unique features to create safer, faster, and more scalable systems. Our guests share their experiences, challenges, and triumphs in adopting Rust for production environments. Listen in as we discuss topics such as concurrent programming, memory safety, performance optimization, and how Rust's ownership model contributes to building robust software systems. Whether you're a seasoned Rust developer, an infrastructure engineer, or a tech leader considering Rust for your next project, "Rust in Production" offers valuable insights and practical knowledge. Release Schedule "Rust in Production" releases new episodes every other Thursday at 4 PM UTC. Our podcast is structured into seasons, each featuring a diverse range of companies and experts in the Rust ecosystem. Recent episodes have included: - Season 2: Interviews with representatives from System76, Fusion Engineering, OxidOS, Matic, Thunderbird, AMP, and curl. - Season 1: Conversations with leaders from Sentry, Tweede Golf, Arroyo, Apollo, PubNub, and InfluxData. What You'll Learn - Real-world case studies of Rust implementation in production environments - Insights into how companies overcome technical challenges using Rust - Best practices for adopting Rust in various infrastructure contexts - The impact of Rust on software reliability, efficiency, and scalability - Future trends in systems programming and infrastructure development Join us as we uncover the latest trends in Rust development, explore best practices for using Rust in production, and examine how this language is addressing some of the most pressing issues in modern software engineering. From web services and databases to embedded systems and cloud infrastructure, we cover the full spectrum of Rust's impact on the tech industry.