Implement Trustworthy End-to-End Data Solutions
Andy Petrella

#Data
#Observability
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work.
Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need.
Table of Contents
Part I. Introducing Data Observability
Chapter 1. Introducing Data Observability
Chapter 2. Components of Data Observability
Chapter 3. Roles of Data Observability in a Data Organization
part II. Implementing Data Observability
Chapter 4. Generate Data Observations
Chapter 5. Automate the Generation of Data Observations
Chapter 6. Implementing Expectations
part Ill. Data Observability in Action
Chapter 7. Integrating Data Observability in Your Data Stack
Chapter 8. Making Opaque Systems Translucent
Welcome to Fundamentals of Data Observability, a book designed to provide a robust introduction to a crucial, emerging field in data engineering and analytics.
As we venture into an era characterized by unprecedented data growth, the importance of understanding our data—its sources, destinations, usages, and behaviors—has never been more important. Observability, traditionally a term associated with software and systems engineering, has now made its way into the data space, becoming a cornerstone of trustworthy, efficient, and insightful data systems. This book aims to guide readers into the depth of this new and necessary discipline, exploring its principles, techniques, and evolving best practices.
Fundamentals of Data Observability is not just for data engineers or data scientists, but for anyone who interacts with data systems in their daily work life. Whether you’re a chief data officer (CDO), a chief technology officer (CTO), a manager, a leader, a developer, a data analyst, or a business manager, understanding data observability concepts and principles will empower you to make better decisions, build more robust systems, and gain greater insight from your data resources.
This book begins by outlining the core concepts of data observability, drawing parallels to similar concepts in software engineering, and setting the stage for the more advanced material. It subsequently delves into the principles and techniques to achieve data observability, providing practical guidance on how to implement them. The final section discusses how to get started today with the system you are using or have inherited. The book concludes with thoughts about the future of data observability, exploring ongoing research and emerging trends that are set to shape the field in the coming years.
Every chapter in this book is packed with actionable advice to reinforce the topics covered. My aim is not merely to impart knowledge but to facilitate the practical application of data observability concepts in your real-world situations.
I hope that by the end of this book, you not only will understand the “what” and “why” of data observability but will also be armed with the “how”—practical knowledge that you can apply to improve the reliability, usability, and understandability of your data systems.
The field of data observability is still young, and there is much to explore and learn. As you embark on this exciting journey, remember that understanding our data and its usages is not just a technical goal—it’s a foundation for making better decisions, fostering innovation, and driving the success of our enterprises.
Who Should Read This Book
Fundamentals of Data Observability is a vital guide for anyone who plays a role in the world of data engineering, analytics, and governance. This book provides in-depth insight into the principles of data observability and its role in ensuring efficient and reliable data systems. Here’s who should read this book, and why:
In a world increasingly dominated by data, understanding the principles of data observability is crucial. This book will equip you with the knowledge and skills to make your data systems more reliable, understandable, and usable, driving better decision making and business success. Whether you are a hands-on engineer, a team leader, or a strategic decision maker, Fundamentals of Data Observability is an essential addition to your professional library.
Andy Petrella has been in the data industry for almost 20 years, starting his career as a software engineer and data miner in the GIS space. He has evangelized big data for more than a decade, especially Apache Spark for which he created the Spark-Notebook (that has 3100 stars on Github).
During his time evangelizing Spark and helping hundreds of companies in the US and in EU work on their data pipelines and models, he has witnessed the lack of visibility and control of data jobs after they are deployed in production.
Since 2015, he has been talking to tech and data-savvy people to build a sustainable solution for this problem. That is: "how to make data observable"Â in a way that can be adopted smoothly by any data practitioner.
Today, he is regularly invited to companies to educate their data teams, whilst running Kensu, which has more than 50 years of total development time dedicated to building the set tools to help data engineers and their peers to build trust in what they deliver.
Also he is in ongoing talks with advocates such as Gartner to create a definition of Data Observability that refers to all its important facets. Finally, he has written books, blogs, slides, training materials, etc. since 2013, including many materials with O'Reilly.









