An Iterative Process for Production-Ready Applications
Chip Huyen

#Machine_Learning
#ML
#ML_systems
#data
🤖 سیستمهای یادگیری ماشین (ML) هم پیچیده و هم منحصر به فرد هستند.
⚙️ پیچیده به این دلیل که شامل اجزای متنوع و ذینفعان مختلف هستند.
🌐 منحصر به فرد به این دلیل که وابسته به دادهاند و دادهها از یک کاربرد به کاربرد دیگر به شدت متفاوتاند.
📘 در این کتاب، شما یک رویکرد جامع و کلنگر برای طراحی سیستمهای ML خواهید آموخت که قابل اعتماد، مقیاسپذیر، قابل نگهداری و سازگار با تغییرات محیطی و نیازهای کسبوکار باشند.
🛠 Chip Huyen، همبنیانگذار Claypot AI، هر تصمیم طراحی را بررسی میکند—مانند نحوه پردازش و تولید دادههای آموزشی، انتخاب ویژگیها، دفعات بازآموزی مدلها و مواردی که باید پایش شوند—در زمینه اینکه چگونه میتواند به کل سیستم کمک کند تا اهداف خود را محقق سازد.
📊 چارچوب تکراری در این کتاب از مطالعات موردی واقعی پشتیبانی شده و با مراجع کافی تقویت شده است.
💡 این کتاب به شما کمک میکند تا با سناریوهایی مانند:
🔹 مهندسی داده و انتخاب معیارهای مناسب برای حل مسائل کسبوکار
🔹 خودکارسازی فرآیند توسعه، ارزیابی، استقرار و بهروزرسانی مداوم مدلها
🔹 توسعه سیستم پایش برای شناسایی و رفع سریع مشکلات مدلها در تولید
🔹 طراحی یک پلتفرم ML که در کاربردهای مختلف سرویسدهی کند
🔹 توسعه سیستمهای ML مسئولانه
... مقابله کنید.
📑 فهرست مطالب
فصل 1: مرور کلی سیستمهای یادگیری ماشین
فصل 2: مقدمهای بر طراحی سیستمهای یادگیری ماشین
فصل 3: مبانی مهندسی داده
فصل 4: دادههای آموزشی
فصل 5: مهندسی ویژگیها
فصل 6: توسعه مدل و ارزیابی آفلاین
فصل 7: استقرار مدل و سرویس پیشبینی
فصل 8: تغییر توزیع دادهها و پایش
فصل 9: یادگیری مستمر و تست در تولید
فصل 10: زیرساخت و ابزارها برای MLOps
فصل 11: جنبه انسانی یادگیری ماشین
⭐ بازخوردها
💬 "این بهترین کتابی است که میتوانید درباره ساخت، استقرار و مقیاسدهی مدلهای ML در شرکت برای بیشترین تأثیر بخوانید. Chip معلمی ماهر است و دانش او بینظیر است." – Josh Wills
💬 "اگر جدی به ML در تولید اهمیت میدهید و میخواهید سیستمهای ML را از ابتدا تا انتها طراحی و پیادهسازی کنید، این کتاب ضروری است." – Laurence Moroney
💬 "یکی از بهترین منابعی که بر اصول اولیه طراحی سیستمهای ML برای تولید تمرکز دارد. برای ناوبری در دنیای پویا و پر از ابزار و پلتفرمها ضروری است." – Goku Mohandas
💬 "این کتاب نقشه و قطبنمای شما در اکوسیستمی شلوغ اما در حال رشد است. برای متخصصان داخل و خارج Big Tech ضروری است." – Jacopo Tagliabue
💬 "Chip واقعاً یک متخصص جهانی در سیستمهای ML است و نویسندهای درخشان. این کتاب منبعی فوقالعاده برای یادگیری این موضوع است." – Andrey Kurenkov
👩💻 درباره نویسنده
Chip Huyen، همبنیانگذار Claypot AI، یک پلتفرم یادگیری ماشین بلادرنگ است.
💻 او با شرکتهایی مانند NVIDIA، Netflix و Snorkel AI همکاری کرده و به برخی از بزرگترین سازمانهای جهان در توسعه و استقرار سیستمهای ML کمک کرده است.
🎓 او درس CS 329S: Machine Learning Systems Design را در دانشگاه Stanford تدریس میکند که این کتاب بر اساس یادداشتهای آن است.
🏆 در LinkedIn به عنوان Top Voice در توسعه نرمافزار (2019) و علوم داده & AI (2020) شناخته شده است.
📚 نویسنده چهار کتاب پرفروش و اداره یک سرور Discord با بیش از ۶۰۰۰ عضو در زمینه MLOps نیز هست.
Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they're data dependent, with data varying wildly from one use case to the next. In this book, you'll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.
Author Chip Huyen, co-founder of Claypot AI, considers each design decision--such as how to process and create training data, which features to use, how often to retrain models, and what to monitor--in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.
This book will help you tackle scenarios such as:
Table of Contents
Chapter 1. Overview of Machine Learning Systems
Chapter 2. Introduction to Machine Learning Systems Design
Chapter 3. Data Engineering Fundamentals
Chapter 4. Training Data
Chapter 5. Feature Engineering
Chapter 6. Model Development and Offline Evaluation
Chapter 7. Model Deployment and Prediction Service
Chapter 8. Data Distribution Shifts and Monitoring
Chapter 9. Continual Learning and Test in Production
Chapter 10. Infrastructure and Tooling for MLOps
Chapter 11. The Human Side of Machine Learning
"This is, simply, the very best book you can read about how to build, deploy, and scale machine learning models at a company for maximum impact. Chip is a masterful teacher, and the breadth and depth of her knowledge is unparalleled."
- Josh Wills, Software Engineer at WeaveGrid and former Director of Data Engineering, Slack
"There is so much information one needs to know to be an effective machine learning engineer. It's hard to cut through the chaff to get the most relevant information, but Chip has done that admirably with this book. If you are serious about ML in production, and care about how to design and implement ML systems end to end, this book is essential."
- Laurence Moroney, AI and ML Lead, Google
"One of the best resources that focuses on the first principles behind designing ML systems for production. A must-read to navigate the ephemeral landscape of tooling and platform options."
- Goku Mohandas, Founder of Made With ML
"Chip's manual is the book we deserve and the one we need right now. In a blooming but chaotic ecosystem, this principled view on end-to-end ML is both your map and your compass: a must-read for practitioners inside and outside of Big Tech—especially those working at 'reasonable scale.' This book will also appeal to data leaders looking for best practices on how to deploy, manage, and monitor systems in the wild."
- Jacopo Tagliabue, Director of AI, Coveo; Adj. Professor of MLSys, NYU
"Chip is truly a world-class expert on machine learning systems, as well as a brilliant writer. Both are evident in this book, which is a fantastic resource for anyone looking to learn about this topic."
- Andrey Kurenkov, PhD Candidate at the Stanford AI Lab
Ever since the first machine learning course I taught at Stanford in 2017, many people have asked me for advice on how to deploy ML models at their organizations. These questions can be generic, such as "What model should I use?" "How often should I retrain my model?" "How can I detect data distribution shifts?" "How do I ensure that the features used during training are consistent with the features used during inference?"
These questions can also be specific, such as "I'm convinced that switching from batch prediction to online prediction will give our model a performance boost, but how do I convince my manager to let me do so?" or "I'm the most senior data scientist at my company and I've recently been tasked with setting up our first machine learning platform; where do I start?"
My short answer to all these questions is always: "It depends." My long answers often involve hours of discussion to understand where the questioner comes from, what they're actually trying to achieve, and the pros and cons of different approaches for their specific use case.
ML systems are both complex and unique. They are complex because they consist of many different components (ML algorithms, data, business logics, evaluation metrics, underlying infrastructure, etc.) and involve many different stakeholders (data scientists, ML engineers, business leaders, users, even society at large). ML systems are unique because they are data dependent, and data varies wildly from one use case to the next.
For example, two companies might be in the same domain (ecommerce) and have the same problem that they want ML to solve (recommender system), but their resulting ML systems can have different model architecture, use different sets of features, be evaluated on different metrics, and bring different returns on investment.
Many blog posts and tutorials on ML production focus on answering one specific question. While the focus helps get the point across, they can create the impression that it's possible to consider each of these questions in isolation. In reality, changes in one component will likely affect other components. Therefore, it's necessary to consider the system as a whole while attempting to make any design decision.
This book takes a holistic approach to ML systems. It takes into account different components of the system and the objectives of different stakeholders involved. The content in this book is illustrated using actual case studies, many of which I've personally worked on, backed by ample references, and reviewed by ML practitioners in both academia and industry. Sections that require in-depth knowledge of a certain topic—e.g., batch processing versus stream processing, infrastructure for storage and compute, and responsible AI—are further reviewed by experts whose work focuses on that one topic. In other words, this book is an attempt to give nuanced answers to the questions mentioned above and more.
When I first wrote the lecture notes that laid the foundation for this book, I thought I wrote them for my students to prepare them for the demands of their future jobs as data scientists and ML engineers. However, I soon realized that I also learned tremendously through the process. The initial drafts I shared with early readers sparked many conversations that tested my assumptions, forced me to consider different perspectives, and introduced me to new problems and new approaches.
I hope that this learning process will continue for me now that the book is in your hand, as you have experiences and perspectives that are unique to you. Please feel free to share with me any feedback you might have for this book!
Chip Huyen (https://huyenchip.com) is a co-founder of Claypot AI, a platform for real-time machine learning. Through her work at NVIDIA, Netflix, and Snorkel AI, she has helped some of the world's largest organizations develop and deploy machine learning systems. She teaches CS 329S: Machine Learning Systems Design at Stanford, whose lecture notes this book is based on.
LinkedIn included her among Top Voices in Software Development (2019) and Top Voices in Data Science & AI (2020). She is also the author of four bestselling Vietnamese books, including the series Xach ba lo len va Di (Pack Your Bag and Go). She also runs a Discord server on MLOps with over 6,000 members (https://discord.com/invite/Mw77HPrgjF).









