From Lab to Production
Trevor Grant, Holden Karau, Boris Lublinsky, Richard Liu, Ilan Filonenko

#Kubeflow
#Machine_Learning
#TensorFlow
#Keras
#Scikit-Learn
If you're training a machine learning model but aren't sure how to put it into production, this book will get you there. Kubeflow provides a collection of cloud native tools for different stages of a model's lifecycle, from data exploration, feature preparation, and model training to model serving. This guide helps data scientists build production-grade machine learning implementations with Kubeflow and shows data engineers how to make models scalable and reliable.
Using examples throughout the book, authors Holden Karau, Trevor Grant, Ilan Filonenko, Richard Liu, and Boris Lublinsky explain how to use Kubeflow to train and serve your machine learning models on top of Kubernetes in the cloud or in a development environment on-premises.
We wrote this book for data engineers and data scientists who are building machine learning systems/models they want to move to production. If you’ve ever had the experience of training an excellent model only to ask yourself how to deploy it into production or keep it up to date once it gets there, this is the book for you. We hope this gives you the tools to replace Untitled_5.ipynb with something that works relatively reliably in production.
This book is not intended to serve as your first introduction to machine learning. The next section points to some resources that may be useful if you are just getting started on your machine learning journey.
This book assumes that you either understand how to train models locally, or are working with someone who does. If neither is true, there are many excellent introductory books on machine learning to get you started, including Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition, by Aurélien Géron (O’Reilly).
Our goal is to teach you how to do machine learning in a repeatable way, and how to automate the training and deployment of your models. A serious problem here is that this goal includes a wide range of topics, and it is more than reasonable that you may not be intimately familiar with all of them.
Trevor Grant is a member of the Apache Software Foundation, and is heavily involved in the Apache Mahout, Apache Streams, and Community Development projects. He often tinkers and occasionally documents his (mis)adventures at www.rawkintrevo.org. In the before time, he was an international speaker on technology, but now he focuses mainly on writing. Trevor wishes to thank IBM for their continued patronage of his artistic endeavors. He lives in Chicago because it's the best city on the planet, with world class food, parks, and culture, and because the skies are never orange.
Holden Karau is a queer transgender Canadian, Apache Spark committer, Apache Software Foundation member, and an active open source contributor. She also extends her passion for building community with industry projects including Scaling for Python for ML and teaching distributed computing to children. As a software engineer, she's worked on a variety of distributed compute, search, and classification problems at Google, IBM, Alpine, Databricks, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor of mathematics in computer science. Outside of software she enjoys playing with fire, welding, riding scooters, eating poutine, and dancing.
Boris Lublinsky is a Principal Architect at Lightbend. Boris has over 25 years experience in enterprise, technical architecture, and software engineering. He is an active member of OASIS SOA RM committee, co-author of Applied SOA: Service-Oriented Architecture and Design Strategies (Wiley) and author of numerous articles on Architecture, Programming, Big Data, SOA and BPM.
Richard Liu is a Senior Software Engineer at Waymo, where he focuses on building a machine learning platform for self-driving cars. Previously he has worked at Microsoft Azure and Google Cloud. He is one of the primary maintainers of the Kubeflow project and has given several talks at KubeCon. He holds a Master's degree in Computer Science from University of California, San Diego.
Ilan Filonenko is a member of the Data Science Infrastructure team at Bloomberg, where he has designed and implemented distributed systems at both the application and infrastructure level. Previously, Ilan was an engineering consultant and technical lead in various startups and research divisions across multiple industry verticals, including medicine, hospitality, finance, and music. He actively contributes to open source, primarily Apache Spark and Kubeflow’s KFServing. He is one of the principal contributors to Spark on Kubernetes—primarily focusing on remote shuffle and HDFS security, and to multi-model serving in KFServing. Ilan’s research has been in algorithmic, software, and hardware techniques for high-performance machine learning with a focus on optimizing stochastic algorithms and model management.









