Collective Wisdom from the Experts
Tobias Macey

#97_Things
#Data
#Engineering
Take advantage of today's sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges.
Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents 97 concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers.
Topics include:
Table of Contents
Chapter 1. A (Book) Case for Eventual Consistency
Chapter 2. A/B and How to Be
Chapter 3. About the Storage Layer
Chapter 4. Analytics as the Secret Glue for Microservice Architectures
Chapter 5. Automate Your Infrastructure
Chapter 6. Automate Your Pipeline Tests
Chapter 7. Be Intentional About the Batching Model in Your Data Pipelines
Chapter 8. Beware of Silver-Bullet Syndrome
Chapter 9. Building a Career as a Data Engineer
Chapter 10. Business Dashboards for Data Pipelines
Chapter 11. Caution: Data Science Projects Can Turn into the Emperor's New Clothes
Chapter 12. Change Data Capture
Chapter 13. Column Names as Cont racts
Chapter 14. Consensual, Privacy-Aware Data Collection
Chapter 15. Cult ivate Good Working Relationships with Data Consumers
Chapter 16. Data Engineering ! = Spark
Chapter 17. Data Engineering for Autonomy and Ra pid Innovation
Chapter 18. Data Engineering from a Data Scientist's Perspective
Chapter 19. Data Pipeline Design Patterns for Reusability and Extensibility
Chapter 20. Data Quality for Data Engineers
Chapter 21. Data Security for Data Engineers
Chapter 22. Data Validation Is More Than Summary Statistics
Chapter 23. Data Warehouses Are the Past, Present, and Future
Chapter 24. Defining and Managing Messages in Log-Centric Architectures
Chapter 25. Demystify the Source and Illuminate the Data Pipeline
Chapter 26. Develop Communities, Not Just Code
Chapter 27. Effective Data Engineering in the Cloud World
Chapter 28. Embrace the Data Lake Architecture
Chapter 29. Embracing Data Silos
Chapter 30. Engineering Reproducible Data Science Projects
Chapter 31. Five Best Practices for Stable Data Processing
Chapter 32. Focus on Maintainability and Break Up Those ETL Tasks
Chapter 33. Friends Don't Let Friends Do Dual-Writes
Chapter 34. Fundamental Knowledge
Chapter 35. Getting the "Structured" Back into SQL
Chapter 36. Give Data Products a Frontend with Latent Documentation
Chapter 37. How Data Pipelines Evolve
Chapter 38. How to Build Your Data Platform like a Product
Chapter 39. How to Prevent a Data Mutiny
Chapter 40. Know the Value per Byte of Your Data
Chapter 41. Know Your Latencies
Chapter 42. Learn to Use a NoSQL Database, but Not like an RDBMS
Chapter 43. Let the Robots Enforce the Rules
Chapter 44. Listen to Your Users- but Not Too Much
Chapter 45. Low-Cost Sensors and the Quality of Data
Chapter 46. Maintain Your Mechanical Sympathy
Chapter 47. Metadata ~ Data
Chapter 48. Metadata Services as a Core Component of the Data Platform
Chapter 49. Mind the Gap: Your Data Lake Provides No ACID Guarantees
Chapter 50. Modern Metadata for the Modern Data Stack
Chapter 51. Most Data Problems Are Not Big Data Problems
Chapter 52. Moving from Software Engineering to Data Engineering
Chapter 53. Observability for Data Engineers
Chapter 54. Perfect Is the Enemy of Good
Chapter 55. Pipe Dreams
Chapter 56. Preventing the Data Lake Abyss
Chapter 57. Prioritizing User Experience in Messaging Systems
Chapter 58. Privacy Is Your Problem
Chapter 59. QA and All Its Sexiness
Chapter 60. Seven Things Data Engineers Need to Watch Out for in ML Projects
Chapter 61. Six Dimensions for Picking an Analytical Data Warehouse
Chapter 62. Small Files in a Big Data World
Chapter 63. Streaming Is Different from Batch
Chapter 64. Tardy Data
Chapter 65. Tech Should Take a Back Seat for Data Project Success
Chapter 66. Ten Must-Ask Questions for Data-Engineering Projects
Chapter 67. The Data Pipeline Is Not About Speed
Chapter 68. The Dos and Don'ts of Data Engineering
Chapter 69. The End of ETL as We Know It
Chapter 70. The Haiku Approach to Writing Software
Chapter 71. The Hidden Cost of Data Input/Output
Chapter 72. The Holy War Between Proprietary and Open Source Is a Lie
Chapter 73. The Implications of the CAP Theorem
Chapter 74. The Importance of Data Lineage
Chapter 75. The Many Meanings of Missingness
Chapter 76. The Six Words That Will Destroy Your Career
Chapter 77. The Three Invaluable Benefits of Open Source for Testing Data Quality
Chapter 78. The Three Rs of Data Engineering
Chapter 79. The Two Types of Data Engineering and Data Engineers
Chapter 80. The Yin and Yang of Big Data Scalability
Chapter 81. Threading and Concurrency in Data Processing
Chapter 82. Three Important Distributed Programming Concepts
Chapter 83. Time (Semantics) Won't Wait
Chapter 84. Tools Don't Matter, Patterns and Practices Do
Chapter 85. Total Opportunity Cost of Ownership
Chapter 86. Understanding the Ways Different Data Domains Solve Problems
Chapter 87. What Is a Data Engineer? Clue: We're Data Science Enablers
Chapter 88. What Is a Data Mesh, and How Not to Mesh It Up
Chapter 89. What Is Big Data?
Chapter 90. What to Do When You Don't Get Any Credit
Chapter 91. When Our Data Science Team Didn't Produce Value
Chapter 92. When to Avoid the Naive Approach
Chapter 93. When to Be Cautious About Sharing Data
Chapter 94. When to Talk and When to Listen
Chapter 95. Why Data Science Teams Need Generalists, Not Specialists
Chapter 96. With Great Data Comes Great Responsibility
Chapter 97. Your Data Tests Failed! Now What?
Tobias Macey hosts the Data Engineering Podcast and Podcast.\_\_init\_\_ where he discusses the tools, topics, and people that comprise the data engineering and Python communities respectively. His experience across the domains of infrastructure, software, cloud, and data engineering allows him to ask informed questions and bring useful context to the discussions. The ongoing focus of his career is to help educate people, through designing and building platforms that power online learning, consulting with companies and investors to understand the possibilities of emerging technologies, and leading teams of engineers to help them grow professionally.









