Build and Run Data Pipelines
Mickael Maison,Kate Stanley

#Kafka
#Apache_Kafka
Used by more than 80% of Fortune 100 companies, Apache Kafka has become the de facto event streaming platform. Kafka Connect is a key component of Kafka that lets you flow data between your existing systems and Kafka to process data in real time.
With this practical guide, authors Mickael Maison and Kate Stanley show data engineers, site reliability engineers, and application developers how to build data pipelines between Kafka clusters and a variety of data sources and sinks. Kafka Connect allows you to quickly adopt Kafka by tapping into existing data and enabling many advanced use cases. No matter where you are in your event streaming journey, Kafka Connect is the ideal tool for building a modern data pipeline.
Who Should Read This Book
This book is written for all roles that interact with Kafka Connect environments. We have chosen to use the terms data engineers, site reliability engineers, and developers to distinguish between roles. Data engineers design and build pipelines to process and analyze data. This includes selecting the correct tools, designing the data flow, and testing the pipeline. Site reliability engineers are responsible for deploying and administering Kafka Connect environments. They may manage a single Kafka Connect cluster or many, and each cluster might be running multiple data pipelines. Finally, developers customize Kafka Connect by building custom plug-ins. This is an advanced use case, but much of the knowledge that is applicable to this role is also useful for data engineers to assess available tools.
In many organizations, it is likely the same engineers who perform all three roles, but in larger organizations it could be completely different teams. Although we split the book into multiple parts to cover these different roles, you will likely find it useful to understand them all.
You don’t need any prior knowledge of Kafka or Kafka Connect to read this book. If you are already familiar with Kafka, feel free to skip Chapter 2, as this covers the Kafka basics you need to understand to use Kafka Connect. Equally, if you are already familiar with Kafka Connect, this book is still written with you in mind. Throughout the book we share best practices and advanced tips to help you develop your expertise further.
This book will give you all the knowledge needed to build reliable data pipelines for your use cases and run them in production. Kafka: The Definitive Guide is the go-to text for Kafka (we both keep a copy on our desk) and we hope this book will be the same for Kafka Connect.
"Kafka Connect is the pillar for integrating Apache Kafka with the rest of the data ecosystem. This book tells you everything you need to know to connect external data sources and sinks with Kafka."
-- Jun Rao,
Cofounder, Confluent
"This comprehensive book covers everything from getting started to productionizing Kafka Connect at scale. It gives you the tools to build streaming data pipelines with Apache Kafka."
-- Ryanne Dolan,
Senior Staff Software Engineer, LinkedIn
"An invaluable resource for anyone looking to use Kafka alongside existing systems. I only wish I'd had access to this book when I first began using Kafka Connect!"
-- Danica Fine,
Senior Developer Advocate, Confluent
"An invaluable resource for both novice and seasoned professionals working with Kafka Connect. It offers comprehensive explanations, and a wealth of practical tips."
-- Robin Moffatt,
Principal DevEx Engineer, Decodable
Mickael Maison is a principal software engineer at Red Hat, a committer and chair of the project management committee (PMC) for Apache Kafka.
Kate Stanley is a principal software engineer at Red Hat, a technical speaker, and a Java Champion.