A Beginner’s Guide for Building Datasets for Analysis
Renée M. P. Teate
SQL#
Data#
Dataset#
Jump-start your career as a data scientist―learn to develop datasets for exploration, analysis, and machine learning
SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis is a resource that’s dedicated to the Structured Query Language (SQL) and dataset design skills that data scientists use most. Aspiring data scientists will learn how to how to construct datasets for exploration, analysis, and machine learning. You can also discover how to approach query design and develop SQL code to extract data insights while avoiding common pitfalls.
You may be one of many people who are entering the field of Data Science from a range of professions and educational backgrounds, such as business analytics, social science, physics, economics, and computer science. Like many of them, you may have conducted analyses using spreadsheets as data sources, but never retrieved and engineered datasets from a relational database using SQL, which is a programming language designed for managing databases and extracting data.
This guide for data scientists differs from other instructional guides on the subject. It doesn’t cover SQL broadly. Instead, you’ll learn the subset of SQL skills that data analysts and data scientists use frequently. You’ll also gain practical advice and direction on "how to think about constructing your dataset."
In this book, author Renee Teate shares knowledge gained during a 15-year career working with data, in roles ranging from database developer to data analyst to data scientist. She guides you through SQL code and dataset design concepts from an industry practitioner’s perspective, moving your data scientist career forward!
Contents
Chapter 1: Data Sources
Chapter 2: The SELECT Statement
Chapter 3: The WHERE Clause
Chapter 4: CASE Statements
Chapter 5: SQL JOINs
Chapter 6: Aggregating Results for Analysis
Chapter 7: Window Functions and Subqueries
Chapter 8: Date and Time Functions
Chapter 9: Exploratory Data Analysis with SQL
Chapter 10: Building SQL Datasets for Analytical Reporting
Chapter 11: More Advanced Query Structures
Chapter 12: Creating Machine Learning Datasets Using SQL
Chapter 13: Analytical Dataset Development Examples
Chapter 14: Storing and Modifying Data
About the Author
Renée M. P. Teate is the Director of Data Science at HelioCampus, leading a team that builds predictive models for colleges and universities. She has worked with data professionally since 2004, in roles including relational database design, data-driven website development, data analysis and reporting, and data science. With degrees in Integrated Science and Technology from James Madison University and Systems Engineering from the University of Virginia, along with a varied career working with data at every stage in a number of systems, she considers herself to be a “data generalist”.
Renée regularly speaks at technology and higher ed conferences and meetups, and writes in industry publications about her data science work and about navigating data science career paths. She also created the “Becoming a Data Scientist” podcast and @BecomingDataSci twitter account, where she’s known to her over 60k followers as “Data Science Renee”. She always tells aspiring data scientists to learn SQL, since it has been one of the most valuable and enduring skills needed throughout her career.