Batch, Real-Time, and LLM Systems
Jim Dowling

#Machine_Learning
#Feature_Store
#LLM
#Batch
#ML
#MLOps
Get up to speed on a new unified approach to building machine learning (ML) systems with a feature store. Using this practical book, data scientists and ML engineers will learn in detail how to develop and operate batch, real-time, and agentic ML systems.
Author Jim Dowling introduces fundamental principles and practices for developing, testing, and operating ML and AI systems at scale. You'll see how any AI system can be decomposed into independent feature, training, and inference pipelines connected by a shared data layer. Through example ML systems, you'll tackle the hardest part of ML systems--the data, learning how to transform data into features and embeddings, and how to design a data model for AI.
Table of Contents
Part I. The FTI Pipeline Architecture for Machine Learning Systems
Chapter 1. Building Machine Learning Systems
Chapter 2. Machine Learning Pipelines
Chapter 3. Your Friendly Neighborhood Air Quality Forecasting Service
Part II. Feature Stores
Chapter 4. Feature Stores
Chapter 5. Hopsworks Feature Store
Part III. Data Transformations
Chapter 6. Model-Independent Transformations
Chapter 7. Model-Dependent and On-Demand Transformations
Chapter 8. Batch Feature Pipelines
Chapter 9. Streaming and Real-Time Features
Part IV. Training Models
Chapter 10. Training Pipelines
Part V. Inference and Agents
Chapter 11. Inference Pipelines
Chapter 12. Agents and LLM Workflows
Part VI. MLOps and LLMOps
Chapter 13. Testing Al Systems
Chapter 14. ObseNability and Monitoring Al Systems
Chapter 15. TikTok's Personalized Recommender: The World's Most Valuable Al System
Review
Here's what some builders in the Data and AI space have to say about it:
"This book shows how modern feature engineering is really done. It bridges the gap between research and production. A must-read for anyone serious about building efficient, real-world ML systems"
- Ritchie Vink, Creator of Polars, CEO & Founder Polars Inc
"Jim does a great job explaining the crucial systems aspects to ML and gives a lot of practical tips on how to navigate production ML deployments".
- Hannes Mühleisen, Co-Creator of DuckDB, CEO of DuckDB Labs.
"Building machine learning systems in production has historically involved a lot of black magic and undocumented learnings. Jim Dowling is doing a great service to ML practitioners by sharing the best practices and putting together a clear step-by-step guide."
- Erik Bernhardsson, Inventor of Luigi and Modal. Founder and CEO at Modal.
"The truly hard part of ML is building the scalable, reliable data systems that power them. Jim is one of the few people who can explain system level challenges with exceptional clarity. This book is the definitive, practical guide for bridging the gap from research to real world production grade systems."
- Willem Pienaar, Inventor of Feast Feature Store
"Jim's the closest thing we have to a world-class expert. Read this book if you want a detailed, practical, re-usable manual on how to get a good-quality running system - as an SRE, I especially appreciate his attention to observability and debugging. The detailed case studies are crunchy icing on a filling cake."
- Niall Murphy, O'Reilly Author, SRE legend
"A must-read for AI/ML practitioners looking to match use cases to the right ML platforms and tools. "
- Lalith Suresh, Co-Creator of Feldera.
"Nobody has captured before the essentials of building AI apps using modern data streaming systems like Flink. Jim's book shows the way!".
- Paris Carbone, Apache Flink SIGMOD Winner
"I witnessed the rise of feature stores at Uber, where ML-powered products operated on batch and real-time data. Jim Dowling helped define the category, and this book gives every engineer a practical playbook for shipping production-grade ML systems that matter."
About the Author
Jim Dowling is CEO of Hopsworks and an Associate Professor at KTH Royal Institute of Technology. He's led the development of Hopsworks that includes the first open-source feature store for machine learning. He has a unique background in the intersection of data and AI. For data, he worked at MySQL and later led the development of HopsFS, a distributed file system that won the IEEE Scale Prize in 2017. For AI, his PhD introduced Collaborative Reinforcement Learning, and he developed and taught the first course on Deep Learning in Sweden in 2016. He also released a popular online course on serverless machine learning using Python at serverless-ml.org. This combined background of Data and AI helped him realize the vision of a feature store for machine learning based on general purpose programming languages, rather than the earlier feature store work at Uber on DSLs. He was the first evangelist for feature stores, helping to create the feature store product category through talks at industry conferences, like Data/AI Summit, PyData, OSDC, and educational articles on feature stores. He is the organizer of the annual feature store summit conference and the featurestore.org community, as well as co-organizer of PyData Stockholm.









