Doing More with Less Data
Wee Hyong Tok, Amit Bahree, and Senja Filipi

#Data
#ML
#AI
Most data scientists and engineers today rely on quality labeled data to train machine learning models. But building a training set manually is time-consuming and expensive, leaving many companies with unfinished ML projects. There's a more practical approach. In this book, Wee Hyong Tok, Amit Bahree, and Senja Filipi show you how to create products using weakly supervised learning models.
You'll learn how to build natural language processing and computer vision projects using weakly labeled datasets from Snorkel, a spin-off from the Stanford AI Lab. Because so many companies have pursued ML projects that never go beyond their labs, this book also provides a guide on how to ship the deep learning models you build.
Table of Contents
Chapter 1. Introduction to Weak Supervision
Chapter 2. Diving into Data Programming with Snorkel
Chapter 3. labeling in Action
Chapter 4. Using the Snorkel-labeled Dataset for Text Classification
Chapter 5. Using the Snorkel-labeled Dataset for Image Classification
Chapter 6. Scalability and Distributed Training
Getting quality labeled data for supervised learning is an important step toward training performant machine learning models. In many real-world projects, getting labeled data often takes up a significant amount of time. Weak supervision is emerging as an important catalyst for enabling data science teams to fuse insights from heuristics and crowd-sourcing to produce weakly labeled datasets that can be used as inputs for machine learning and deep learning tasks.
Who Should Read This Book
The primary audience of the book will be professional and citizen data scientists who are already working on machine learning projects and face the typical challenges of getting good-quality labeled data for these projects. They will have working knowledge of the programming language Python and be familiar with machine learning libraries and tools.
About the Author
Wee Hyong Tok has an extensive track record as a product and data science leader, with a background in product management, machine learning, deep learning, and research.
Amit Bahree is an accomplished engineering and technology leader with 25 years of experience and a proven ability to build and grow multiple products and teams.
Senja Filipi has more than a decade of experience as a software engineer, with half of it working in full stack ML applications.









