0
نام کتاب
Cost-Effective Data Pipelines

Balancing Trade-Offs When Developing Pipelines in the Cloud

Sev Leonard

Paperback289 Pages
PublisherO'Reilly
Edition1
LanguageEnglish
Year2023
ISBN9781492098645
949
A3334
انتخاب نوع چاپ:
جلد سخت
597,000ت
0
جلد نرم
517,000ت
0
طلق پاپکو و فنر
527,000ت
0
مجموع:
0تومان
کیفیت متن:اورجینال انتشارات
قطع:B5
رنگ صفحات:دارای متن و کادر رنگی
پشتیبانی در روزهای تعطیل!
ارسال به سراسر کشور

#Data

#Pipelines

توضیحات

The low cost of getting started with cloud services can easily evolve into a significant expense down the road. That's challenging for teams developing data pipelines, particularly when rapid changes in technology and workload require a constant cycle of redesign. How do you deliver scalable, highly available products while keeping costs in check?


With this practical guide, author Sev Leonard provides a holistic approach to designing scalable data pipelines in the cloud. Intermediate data engineers, software developers, and architects will learn how to navigate cost/performance trade-offs and how to choose and configure compute and storage. You'll also pick up best practices for code development, testing, and monitoring.


By focusing on the entire design process, you'll be able to deliver cost-effective, high-quality products. This book helps you:

  • Reduce cloud spend with lower cost cloud service offerings and smart design strategies
  • Minimize waste without sacrificing performance by rightsizing compute resources
  • Drive pipeline evolution, head off performance issues, and quickly debug with effective monitoring
  • Set up development and test environments that minimize cloud service dependencies
  • Create data pipeline code bases that are testable and extensible, fostering rapid development and evolution
  • Improve data quality and pipeline operation through validation and testing


Chapter 1. Designing Compute for Data Pipelines

Chapter 2. Responding to Changes in Demand by Scaling Compute

Chapter 3. Data Organization in the Cloud

Chapter 4. Economical Pipeline Fundamentals

Chapter 5. Setting Up Effective Development Environments

Chapter 6. Software Development Strategies

Chapter 7. Unit Testing

Chapter 8. Mocks

Chapter 9. Data for Testing

Chapter 10. Logging

Chapter 11. Finding Your Way with Monitoring

Chapter 12. Essential Takeaways


Who This Book Is For

I’ve geared the content toward an intermediate to advanced audience. I assume you have some familiarity with software development best practices, some basics about working with cloud compute and storage, and a general idea about how batch and streaming data pipelines operate.


This book is written from my experience in the day-to-day development of data pipelines. If this is work you either do already or aspire to do in the future, you can consider this book a virtual mentor, advising you of common pitfalls and providing guidance honed from working on a variety of data pipeline projects.


If you’re coming from a data analysis background, you’ll find advice on software best practices to help you build testable, extendable pipelines. This will aid you in connecting analysis with data acquisition and storage to create end-to-end systems.


Developer velocity and cost-conscious design are areas everyone from individual contributors to managers should have on their mind. In this book, you’ll find advice on how to build quality into the development process, make efficient use of cloud resources, and reduce costs. Additionally, you’ll see the elements that go into monitoring to not only keep tabs on system health and performance but also gain insight into where redesign should be considered.


If you manage data engineering teams, you’ll find helpful tips on effective development practices, areas where costs can escalate, and an overall approach to putting the right practices in place to help your team succeed.


What You Will Learn

If you would like to learn or improve your skill in the following, this book will be a useful guide:

  • Reduce cloud spend with lower-cost cloud service offerings and smart design strategies.
  • Minimize waste without sacrificing performance by right-sizing compute resources.
  • Drive pipeline evolution, head off performance issues, and quickly debug with cost-effective monitoring and logging.
  • Set up development and test environments that minimize cloud service costs.
  • Create data pipeline codebases that are testable and extensible, reducing development time and accelerating pipeline evolution.
  • Limit costly data downtime1 by improving data quality and pipeline operation through validation and testing.


What This Book Is Not

This is not an architecture book. There are aspects that tie back into architecture and system requirements, but I will not be discussing different architectural approaches or trade-offs. I do not cover topics such as data governance, data cataloging, or data lineage.


While I provide advice on how to manage the innate cost–performance trade-offs of building data pipelines in the cloud, this book is not a financial operations (FinOps) text. Where a FinOps book would, for example, direct you to look for unused compute instance hours as potential opportunities to reduce costs, this book gets into the nitty-gritty details of reducing instance hours and associated costs.

The design space of data pipelines is constantly growing and changing. The biggest value I can provide is to describe design techniques that can be applied in a variety of circumstances as the field evolves. Where relevant, I mention some specific, fully managed data ingestion services such as Amazon Web Services (AWS) Glue or Google Dataflow, but the focus of this book is on classes of services that apply across many vendors. Understanding these foundational services will help you get the most out of vendor-managed services.


The cloud service offerings I focus on include object storage such as AWS S3 and GCS, serverless functions such as AWS Lambda, and cluster compute services such as AWS Elastic Compute (EC2), AWS Elastic MapReduce (EMR), and Kubernetes. While managing system boundaries, identity management, and security are aspects of this approach, I will not be covering these topics in this book.

I do not provide advice about database services in this book, as the choice of databases and configurations is highly dependent on specific use cases.

You will learn what you need to log and monitor, but I will not cover the details on how to set up monitoring, as tools used for monitoring vary from company to company.


Review

The cloud data revolution of the mid-2010s gave data engineers easy access to compute and storage at extraordinary scale, but this sea change also made engineers responsible for the daily dollars and cents of their workloads. This is the book we've been waiting for to provide clear, opinionated guidance on monitoring, controlling and optimizing the costs of high performance cloud data systems

-- Matthew Housley

CTO and coauthor of Fundamentals of Data Engineering


Sev's best practices and strategies could have saved my employer millions of dollars. That's a pretty good return on investment for the price of a book and the time to read it.

-- Bar Shirtcliff

Software Engineer


Managing data at scale has always been challenging. Most organizations struggle with over-provisioning resources and inflated project costs. This book provides crystal clear insight on overcoming these challenges and keeping your costs as low as possible.

-- Milind Chaudhari

Sr. Cloud Data Engineer/Architect


This is the most readable guide I've seen in decades for designing and building robust real-world data pipelines. With plenty of context and detailed, non-trivial examples using real-world code, this book will be your 24/7 expert when working through messy problems that have no easy solutions. You'll learn to balance complex trade-offs among cost, performance, implementation time, long-term support, future growth, and myriad other elements that make up today's complex data pipeline landscape.

-- Arnie Wernick

Sr. Technical, IP, and Strategy Advisor


Real world data pipelines are notoriously fickle. Things change, and things break. This book is a great resource for getting ahead of costly data pipeline problems before they get ahead of you.

-- Joe Reis

coauthor of Fundamentals of Data Engineering


This is the manual I wish I had when I was just getting started with data; it would have saved me a lot of suffering! But whether you're just getting started or have decades of experience, the accessible strategies Sev has developed will not only help you build more reliable, cost-effective pipelines; they will also help you communicate about them to a variety of stakeholders. A must-read for anyone working with data!" 

-- Rachel Shadoan

Co-Founder of Akashic Labs


About the Author

With over 20 years of experience in the technology industry Sev brings a breadth of experience spanning circuit design for Intel microprocessors, user-driven application development, and data platform development at both small and large scale. Throughout his career Sev has been a writer, speaker, and teacher along with his technical contributions, seeking to pass on what he has learned and make technology education accessible to all.


Sev's experience developing cloud data pipelines across multiple cloud service providers in large-scale batch and real-time environments, alongside his established record of writing and teaching, make him uniquely qualified to write Cost-effective Data Pipelines. Sev's hands-on experience as a data-engineer coupled with his ability to synthesize ideas provide him both with the subject matter expertise to speak on the topics in Cost-effective Data Pipelines and to elucidate these advanced concepts to readers. Sev's focus on providing actionable, hands-on content in his classes, tutorials, and interactive sessions guarantees an approach that readers will be able to quickly put into practice.

دیدگاه خود را بنویسید
نظرات کاربران (0 دیدگاه)
نظری وجود ندارد.
کتاب های مشابه
Data
1,185
Database Systems
1,847,000 تومان
Data
889
Learning Airtable
629,000 تومان
Data
364
Database Design and Implementation
890,000 تومان
Data
668
Learning OpenTelemetry
374,000 تومان
Data
1,445
Storytelling with Data
511,000 تومان
Cloud
923
IBM Cloud Pak for Data
642,000 تومان
Data
947
Stream Processing with Apache Flink
544,000 تومان
Data
907
The Self-Service Data Roadmap
515,000 تومان
Data
732
Data Centre Essentials
461,000 تومان
Data
1,004
MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exa...
1,724,000 تومان
قیمت
منصفانه
ارسال به
سراسر کشور
تضمین
کیفیت
پشتیبانی در
روزهای تعطیل
خرید امن
و آسان
آرشیو بزرگ
کتاب‌های تخصصی
هـر روز با بهتــرین و جــدیــدتـرین
کتاب های روز دنیا با ما همراه باشید
آدرس
پشتیبانی
مدیریت
ساعات پاسخگویی
درباره اسکای بوک
دسترسی های سریع
  • راهنمای خرید
  • راهنمای ارسال
  • سوالات متداول
  • قوانین و مقررات
  • وبلاگ
  • درباره ما
چاپ دیجیتال اسکای بوک. 2024-2022 ©