Flexible Distributed Python for Machine Learning
Max Pumperla, Edward Oakes, and Richard Liaw

#Distributed_Python
#Python
#Machine_Learning
#Ray
Get started with Ray, the open source distributed computing framework that simplifies the process of scaling compute-intensive Python workloads. With this practical book, Python programmers, data engineers, and data scientists will learn how to leverage Ray locally and spin up compute clusters. You'll be able to use Ray to structure and run machine learning programs at scale.
Authors Max Pumperla, Edward Oakes, and Richard Liaw show you how to build machine learning applications with Ray. You'll understand how Ray fits into the current landscape of machine learning tools and discover how Ray continues to integrate ever more tightly with these tools. Distributed computation is hard, but by using Ray you'll find it easy to get started.
Table of Contents
Chapter 1. An Overview of Ray
Chapter 2. Getting Started with Ray Core
Chapter 3. Building Your First Distributed Application
Chapter 4. Reinforcement Learning with Ray RLlib
Chapter 5. Hyperparameter Optimization with Ray Tune
Chapter 6. Data Processing with Ray
Chapter 7. Distributed Training with Ray Train
Chapter 8. Online Inference with Ray Serve
Chapter 9. Ray Clusters
Chapter 10. Getting Started with the Ray Al Runtime
Chapter 11. Ray's Ecosystem and Beyond
It’s likely that you picked up this book because you’re interested in some aspects of Ray. Maybe you’re a distributed systems engineer who wants to know how Ray’s engine works. You might also be a software developer interested in picking up a new technology. Or you could be a data engineer who wants to evaluate how Ray compares to similar tools. You could also be a machine learning practitioner or data scientist who needs to find ways to scale experiments.
No matter your concrete role, the common denominator to get the most out of this book is to feel comfortable programming in Python. This book’s examples are written in Python, and an intermediate knowledge of the language is a requirement. Explicit is better than implicit, as you know full well as a Pythonista. So, let us be explicit by saying that knowing Python implies to me that you know how to use the command line on your system, how to get help when stuck, and how to set up a programming environment on your own.
If you’ve never worked with distributed systems before, that’s OK. We cover all the basics you need to get started with that in the book. On top of that, you can run most code examples presented here on your laptop. Covering the basics means that we can’t go into too much detail about distributed systems. This book is ultimately focused on application developers using Ray, specifically for data science and ML.
For the later chapters of this book, you’ll need some familiarity with ML, but we don’t expect you to have worked in the field. In particular, you should have a basic understanding of the ML paradigm and how it differs from traditional programming. You should also know the basics of using NumPy and Pandas. Also, you should at least feel comfortable reading examples using the popular TensorFlow and PyTorch libraries. It’s enough to follow the flow of the code, on the API level, but you don’t need to know how to write your own models. We cover examples using both dominant deep learning libraries (TensorFlow and PyTorch) to illustrate how you can use Ray for ML workloads, regardless of your preferred framework.
We cover a lot of ground in advanced ML topics, but the main focus is on Ray as a technology and how to use it. The ML examples we discuss might be new to you and could require a second reading, but you can still focus on Ray’s API and how to use it in practice. Knowing the requirements, here’s what you might get out of this book:
You can learn all of these topics regardless of your role, of course. Our hope is that by the end of this book, you will have learned to appreciate Ray for all its strengths.
Goals of This Book
This book was written primarily for readers who are new to Ray and want to get the most out of it quickly. We chose the material in such a way that you will understand the core ideas behind Ray and learn to use its main building blocks. Having read it, you will feel comfortable navigating more complex topics on your own that go beyond this introduction.
We should also be clear about what this book is not. It’s not built to give you the most information possible, like API references or definitive guides. It’s also not crafted to help you tackle concrete tasks, like how-to guides or cookbooks do. This book is focused on learning and understanding Ray and giving you interesting examples to start with.
Software develops and deprecates quickly, but the fundamental concepts underlying software often remain stable even across major release cycles. We’re trying to strike a balance here between conveying ideas and providing you with concrete code examples. The ideas you find in this book will ideally remain useful even when the code eventually needs updating.
While Ray’s documentation keeps getting better, we do believe that books can offer qualities that are difficult to match in a project’s documentation. Since you’re reading these lines, we realize we might be knocking down open doors with this statement. But some of the best tech books we know spark interest in a project and make you want to dig through terse API references that you’d never have touched otherwise. We hope this is one of those books.
Max Pumperla is a data science professor and software engineer located in Hamburg, Germany. He’s an active open source contributor, maintainer of several Python packages, and author of machine learning books. He currently works as software engineer at Anyscale. As head of product research at Pathmind Inc. he was developing reinforcement learning solutions for industrial applications at scale using Ray RLlib, Serve and Tune.
Edward Oakes (ed.nmi.oakes@gmail.com), writing chapters 7 (data) & 9 (serving): "Edward is a software engineer and team lead at Anyscale, where he leads the development of Ray Serve and is one of the top open source contributors to Ray. Prior to Anyscale, he was a graduate student in the EECS department at UC Berkeley."
RIchard Liaw (rliaw@berkeley.edu), writing chapters 6 (training) & 8 (clusters): Richard Liaw is a software engineer at Anyscale, working on open source tools for distributed machine learning. He is on leave from the PhD program at the Computer Science Department at UC Berkeley, advised by Joseph Gonzalez, Ion Stoica, and Ken Goldberg.









