From Notebooks to Scalable Systems
Catherine Nelson

#Software_Engineering
#Data_Scientists
#data_science
#APIs
#NumPy
#robust
#Python
#Security
Data science happens in code. The ability to write reproducible, robust, scaleable code is key to a data science project's success—and is absolutely essential for those working with production code. This practical book bridges the gap between data science and software engineering,and clearly explains how to apply the best practices from software engineering to data science.
Examples are provided in Python, drawn from popular packages such as NumPy and pandas. If you want to write better data science code, this guide covers the essential topics that are often missing from introductory data science or coding classes, including how to:
Table of Contents
Chapter 1. What Is Good Code?
Chapter 2. Analyzing Code Performance
Chapter 3. Using Data Structures Effectively
Chapter 4. Object-Oriented Programming and Functional Programming
Chapter 5. Errors, Logging, and Debugging
Chapter 6. Code Formatting, Linting, and Type Checking
Chapter 7. Testing Your Code
Chapter 8. Design and Refactoring
Chapter 9. Documentation
Chapter 10. Sharing Your Code: Version Control, Dependencies, and Packaging
Chapter 11. APIs
Chapter 12. Automation and Deployment
Chapter 13. Security
Chapter 14. Working in Software
Chapter 15. Next Steps
Who Is This Book For?
This book is aimed at data scientists, but people working in closely related fields such as data analysts, machine learning (ML) engineers, and data engineers will also find it useful. I’ll explain well-established software engineering principles that will be useful to anyone who writes code, but the examples I’ll use to illustrate these principles will be most familiar to data scientists.
I’ve aimed to make this book accessible to data scientists who are relatively new to the field. Maybe you’ve just finished a degree in data science or you’re starting your first job in industry. This book will cover the practical software engineering skills that are not always included in introductory data science courses. Or maybe you didn’t take a formal data science course. Maybe you’re self-taught or you’re moving into data science from math or another science. No matter which route you’re taking into data science, this book is for you.
More experienced data scientists will also learn a great deal, and you’ll find this book especially useful if you’re in a job where you’ll often interact with software developers. You’ll learn the skills that will help you work effectively on a larger codebase and how to write Python code that will work efficiently in production.
I’m assuming that you already know the fundamentals of data science, including data exploration, data visualization, data wrangling, basic ML, and the math skills that go along with these. I’m also assuming that you already know the basics of how to code in Python: how to write functions and control flow statements, and the basics of how to use modules including NumPy, Matplotlib, pandas, and scikit-learn. If these are new to you, I recommend the following books:
This is not a book for software developers who are looking to learn data science and machine learning skills. If this is your situation, I recommend AI and Machine Learning for Coders by Laurence Moroney (O’Reilly, 2020).
This book is the missing link data scientists have long sought, masterfully bridging the gap between data science and software engineering. It offers a clear, actionable guide that fills the crucial skill gap many data scientists face in software engineering, elevating their coding practices to new heights. Truly, this is the book we've been waiting for.
Gabriela de Queiroz, Director of AI, Microsoft; Startup Advisor and Angel Investor
Catherine's book demystifies how to scale your individual work to production capacity. Whether you are a data scientist, developer, or executive, she makes data services at scale accessible. From startup to massive corporate data, following her best practices will set your data projects up for success.
Carol Willing, Core Developer of Python; 2017 ACM Software System Award recipient for Jupyter's lasting influence
I love this book! It's the missing piece on every data scientist's shelf. For years, bootcamps, universities, and industry managers have been trying to get skilled scientists to function more like software engineers. No book bridges that gap, until this one.
Shawn Ling Ramirez, CEO, eloraHQ
Software Engineering for Data Scientists is a must read if you want to take your data science skills from ideas to fully implemented systems. It's a terrific guide to help you through the most important engineering aspects of coding. I wish I'd had this book years ago, it would have saved me countless hours! I thoroughly recommend it.
Laurence Moroney, AI Advocacy Lead, Google
Since its beginnings, data scientists have come from a wide variety of backgrounds in education and experience. While in many ways this has been a strength of the field, often data scientists lack the software engineering skills to work closely with peers from more traditional software development backgrounds. In this book, Catherine Nelson provides a much-needed bridge between the two disciplines, giving data scientists the knowledge to level up their own work and impact.
Chris Albon, Director of Machine Learning, The Wikimedia Foundation
Catherine Nelson is a freelance data scientist and writer. She is currently working on the forthcoming O'Reilly book "Software Engineering for Data Scientists". Previously, she was a Principal Data Scientist at SAP Concur, where she delivered production machine learning applications and developed innovative new features using NLP. She is also co-author of the O'Reilly publication "Building Machine Learning Pipelines", and she is an organizer for Seattle PyLadies, supporting women who code in Python. In her previous career as a geophysicist she studied ancient volcanoes and explored for oil in Greenland. Catherine has a PhD in geophysics from Durham University and a Masters of Earth Sciences from Oxford University.









