Architecting Cloud Native Data Services Using Open Source Technology
Jeff Carpenter, Patrick McFadin

#Cloud_Native
#Data
#Kubernetes
#open
#source
#Helm
#Database
Is Kubernetes ready for stateful workloads? This open source system has become the primary platform for deploying and managing cloud native applications. But because it was originally designed for stateless workloads, working with data on Kubernetes has been challenging. If you want to avoid the inefficiencies and duplicative costs of having separate infrastructure for applications and data, this practical guide can help.
Using Kubernetes as your platform, you'll learn open source technologies that are designed and built for the cloud. Authors Jeff Carpenter and Patrick McFadin provide case studies to help you explore new use cases and avoid the pitfalls others have faced. You'll get an insider's view of what's coming from innovators who are creating next-generation architectures and infrastructure.
With this book, you will:
Table of Contents
Chapter 1. Introduction to Cloud Native Data Infrastructure: Persistence, Streaming, and Batch Analytics
Chapter 2. Managing Data Storage on Kubernetes
Chapter 3. Databases on Kubernetes the Hard Way
Chapter 4. Automating Database Deployment on Kubernetes with Helm
Chapter 5. Automating Database Management on Kubernetes with Operators
Chapter 6. Integrating Data Infrastructure in a Kubernetes Stack
Chapter 7. The Kubernetes Native Database
Chapter 8. Streaming Data on Kubernetes
Chapter 9. Data Analytics on Kubernetes
Chapter 10. Machine Learning and Other Emerging Use Cases
Chapter 11. Migrating Data Workloads to Kubernetes
Why We Wrote This Book
We were caught up in the trend of moving stateful workloads to Kubernetes when our “day job” responsibilities at DataStax challenged us to consider how to deploy and operate Apache Cassandra in Kubernetes effectively. In the spirit of open source development, we sought out other practitioners who were attempting similar feats (and succeeding) with databases and other stateful workloads. We found a group of like-minded individuals and helped launch the Data on Kubernetes Community (DoKC) in 2020. DoKC is now an independent organization and has hosted well over 100 meetups and several in-person events. The variety of topics and presenters in the DoKC meetup is evidence of a vibrant community, working collaboratively to establish standards and best practices. Most importantly, we are learning together, applying lessons from the past and supporting each other as we build something new.
As we participated in these meetups, a set of common themes began to emerge. We heard, again and again, the virtues of the PersistentVolume subsystem, the pros and cons of StatefulSets, the promise of the operator pattern for making database operations more manageable, and the early hints of ideas for new types of data management. Over time, we developed a strong conviction that this fledgling community of practitioners needed a place for all of the wisdom scattered across multiple presentations and blog posts to be gathered and distilled into a digestible form. This book is the result of that process. Much work remains to be done in the area of cloud native data, and many areas need further exploration, including operators, machine learning, data APIs, declarative management of data sets, and many more. Our hope is that this book opens the gates for a flood of additional books, blogs, presentations, and learning resources.
Who Is This Book For?
The primary audience for this book comprises the developers and architects who are designing, building, and running applications in the cloud. If that describes you and you’re picking up this book, chances are you’ve heard the thundering herd of organizations adopting Kubernetes and have joined that trend or are at least considering it. However, you may have also heard the reservations about stateful workloads on Kubernetes and are looking for help in how to proceed. You’ve come to the right place! By reading this book you will gain the following:
A smaller but no less important audience includes core Kubernetes developers and data infrastructure developers, many of whom we’ve met through the DoKC. We hope to create a common set of principles and best practices that we can use as a framework to drive improvements into the Kubernetes core as well as the data infrastructure built to run in Kubernetes. Together we can push the practice of data on Kubernetes forward.
For everyone, know that our objective in this book is to shoot straight. Where the technology is mature and solid, we’ll let you know, but there are also many areas where the technology is still emerging. We’ll make sure to highlight those areas where improvement is needed.
This book challenged my notions about storing data on Kubernetes. I no longer fear the loss of data.
—Jesse Anderson, Managing Director, Big Data Institute
This is the book you need if doing persistence on Kubernetes is your ultimate goal. Jeff and Patrick do a tremendous job in this comprehensive view of Data on Kubernetes to the point where it doesn't have to be scary, especially if you have this book on your shelf!
—Rick Vasquez, Senior Director, Strategic Initiatives, Western Digital
Managing Cloud Native Data on Kubernetes is a groundbreaking work not only because it is the first to tackle this problem space, but because it simultaneously obviates the need for any other book on the subject. Drawing on their decades of experience, Jeff and Patrick give readers the confidence to run stateful workloads on Kubernetes in production. This book will be the reference on the topic for years to come.
—Umair Mufti, Director of Product Management, Portworx by Pure Storage
Kubernetes is notoriously complex, and dealing with persistent data adds to the complexity. This book does an amazing job of taming the complexity of dealing with data using Kubernetes with many useful code examples and architectural diagrams.
—Noah Gift, Duke Executive in Residence
Storage is one of the hardest infrastructure layers to master and arguably has the longest innovation cycles. We are at the cusp of one such innovation cycle at the moment with cloud native applications. Jeff and Patrick have tackled this subject head-on, by having the readers understand the evolution of cloud native storage and help transform their storage strategy to meet the next gen application demands. Anyone that is working with microservices (which is almost everyone at the moment), must read this book before they have completed their transformation projects.
—Kiran Mova, Founder, Architect Storage Startups, Open Source Advocate/Manager, VMware
Jeff has worked as a software engineer and architect in multiple industries and as a developer advocate helping engineers succeed with Apache Cassandra. He's involved in multiple open source projects in the Cassandra and Kubernetes ecosystems including Stargate and K8ssandra. Jeff is coauthor of the O'Reilly books Cassandra: The Definitive Guide and Managing Cloud Native Data on Kubernetes.
Patrick McFadin has been a distributed systems hacker since he first plugged a modem into his Atari computer. Looking for adventure, he joined the US Navy, working on the Naval Tactical Data System (NTDS), which cemented his love of distributed systems. He then spent the 1990s working on infrastructure as the internet started to take off and barely survived the ensuing dot-com crash. Along the way, Patrick picked up a Computer Engineering degree from Cal Poly, San Luis Obispo, and has been focusing on high-scale internet infrastructure ever since. His latest obsession is distributed data systems, and he has been a steady contributor to the Apache Cassandra project since 2011.









