0
نام کتاب
Data Pipelines with Apache Airflow

Julian de Ruiter, Ismael Cabral, Kris Geusebroek, Daniel van der Ende, Bas Harenslak

Paperback513 Pages
PublisherManning
Edition2
LanguageEnglish
Year2026
ISBN9781633433885
1K
A3785
انتخاب نوع چاپ:
جلد سخت
866,000ت
0
جلد نرم
956,000ت(2 جلدی)
0
طلق پاپکو و فنر
976,000ت(2 جلدی)
0
مجموع:
0تومان
کیفیت متن:اورجینال انتشارات
قطع:B5
رنگ صفحات:دارای متن و کادر رنگی
پشتیبانی در روزهای تعطیل!
ارسال به سراسر کشور

#Data_Pipelines

#Apache_Airflow

#LLM

#DAG

#Data

#Apache

#Airflow

#Pipelines

#Airflow

#GCP

#Azure

#AWS

توضیحات

Data Pipelines with Apache Airflow has empowered thousands of data engineers to build more successful data platforms. This new second edition has been fully revised for Airflow 3 with coverage of all the latest features of Apache Airflow, including the Taskflow API, deferrable operators, and Large Language Model integration. Filled with real-world scenarios and examples, you'll be carefully guided from Airflow novice to expert.


Using real-world scenarios and examples, this book teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack. Part reference and part tutorial, each technique is illustrated with engaging hands-on examples, from training machine learning models for generative AI to optimizing delivery routes.


In Data Pipelines with Apache Airflow, Second Edition you'll learn how to:


• Master the core concepts of Airflow architecture and workflow design

• Schedule data pipelines using the Dataset API and time tables, including complex irregular schedules

• Develop custom Airflow components for your specific needs

• Implement comprehensive testing strategies for your pipelines

• Apply industry best practices for building and maintaining Airflow workflows

• Deploy and operate Airflow in production environments

• Orchestrate workflows in container-native environments

• Build and deploy Machine Learning and Generative AI models using Airflow


About the Technology

Apache Airflow provides a unified platform for collecting, consolidating, cleaning, and analyzing data. With its easy-to-use UI, powerful scheduling and monitoring features, plug-and-play options, and flexible Python scripting, Airflow makes it easy to implement secure, consistent pipelines for any data or AI task.


About the book

Data Pipelines with Apache Airflow, Second Edition teaches you how to build, monitor, and maintain effective data workflows. This new edition adds comprehensive coverage of Airflow 3 features, such as event-driven scheduling, dynamic task mapping, DAG versioning, and Airflow’s entirely new UI. The numerous examples address common use cases like data ingestion and transformation and connecting to multiple data sources, along with AI-aware techniques such as building RAG systems.


What's inside

• Deploying data pipelines as Airflow DAGs

• Time and event-based scheduling strategies

• Integrating with databases, LLMs, and AI models

• Deploying Airflow using Kubernetes


About the reader

For data engineers, machine learning engineers, DevOps, and sysadmins with intermediate Python skills.


Table of Contents

Part 1. Getting started

1. Meet Apache Airflow

2. Anatomy of an Airflow DAG

3. Time-based scheduling

4. Asset-aware scheduling

5. Templating tasks using the Airflow context

6. Defining dependencies between tasks


Part 2. Beyond the basics

7. Triggering workflows with external input

8. Communicating with external systems

9. Extending Airflow with custom operators and sensors

10. Testing

11. Running tasks in containers


Part 3. Airflow in practice

12. Best practices

13. Project: Finding the fastest way to get around NYC

14. Project: Keeping family traditions alive with Airflow and generative AI


Part 4. Airflow in production

15. Operating Airflow in production

16. Securing Airflow

17. Airflow deployment options


A. Running code samples

B. Prometheus metric mapping


About the Author

Julian de Ruiter is a Data + AI engineering lead at Xebia Data, with a background in computer and life sciences and a PhD in computational cancer biology. As consultant at Xebia Data, he enjoys helping clients design and build AI solutions and platforms, as well as the teams that drive them. From this work, he has extensive experience in deploying and applying Apache Airflow in production in diverse environments.


Ismael Cabral is a Machine Learning Engineer and Airflow trainer with experience spanning across Europe, US, Mexico, and South America, where he has worked with market-leading companies. He has vast experience implementing data pipelines and deploying machine learning models in production.


Kris Geusebroek is a data-engineering consultant with extensive hands-on experience with Apache Airflow at several clients and is the maintainer of Whirl (the open source local testing with Airflow repository), where he is actively adding new examples based on new functionality and new technologies that integrate with Airflow.


Daniel van der Ende is a Data Engineer who first started using Apache Airflow back in 2016. Since then, he has worked in many different Airflow environments, both on-premises and in the cloud. He has actively contributed to the Airflow project itself, as well as related projects such as Astronomer-Cosmos.


Bas Harenslak is a Staff Architect at Astronomer, where he helps customers develop mission-critical data pipelines at large scale using Apache Airflow and the Astro platform. With a background in software engineering and computer science, he enjoys working on software and data as if they are challenging puzzles. He favours working on open source software, is a committer on the Apache Airflow project, and co-author of the first edition of Data Pipelines with Apache Airflow.

دیدگاه خود را بنویسید
نظرات کاربران (0 دیدگاه)
نظری وجود ندارد.
کتاب های مشابه
Data
698
Apache Hudi: The Definitive Guide
518,000 تومان
Data
999
The Practitioner's Guide to Graph Data
674,000 تومان
Data
663
Deploying Juniper Data Centers with EVPN VXLAN
1,374,000 تومان
Machine Learning
365
Machine Learning for Imbalanced Data
583,000 تومان
Data
269
First-Party Data Activation
490,000 تومان
Data
1,803
Cracking the Data Engineering Interview
406,000 تومان
Data
960
Data Driven Decisions
496,000 تومان
for Beginners
1,142
Data Engineering for Beginners
613,000 تومان
Python
773
Hands-On Entity Resolution
409,000 تومان
AWS
984
Time Series Analysis on AWS
720,000 تومان
قیمت
منصفانه
ارسال به
سراسر کشور
تضمین
کیفیت
پشتیبانی در
روزهای تعطیل
خرید امن
و آسان
آرشیو بزرگ
کتاب‌های تخصصی
هـر روز با بهتــرین و جــدیــدتـرین
کتاب های روز دنیا با ما همراه باشید
آدرس
پشتیبانی
مدیریت
ساعات پاسخگویی
درباره اسکای بوک
دسترسی های سریع
  • راهنمای خرید
  • راهنمای ارسال
  • سوالات متداول
  • قوانین و مقررات
  • وبلاگ
  • درباره ما
چاپ دیجیتال اسکای بوک. 2024-2022 ©