Import, Tidy, Transform, Visualize, and Model Data
Hadley Wickham, Mine Çetinkaya-Rundel, Garrett Grolemund
R#
data#
RStudio#
tidyverse#
data_science#
Use R to turn data into insight, knowledge, and understanding. With this practical book, aspiring data scientists will learn how to do data science with R and RStudio, along with the tidyverseâ??a collection of R packages designed to work together to make data science fast, fluent, and fun. Even if you have no programming experience, this updated edition will have you doing data science quickly.
You'll learn how to import, transform, and visualize your data and communicate the results. And you'll get a complete, big-picture understanding of the data science cycle and the basic tools you need to manage the details. Updated for the latest tidyverse features and best practices, new chapters show you how to get data from spreadsheets, databases, and websites. Exercises help you practice what you've learned along the way.
You'll understand how to:
Table of Contents
Part I. Whole Game
Chapter 1. Data Visualization
Chapter 2. Workflow: Basics
Chapter 3. Data Transformation
Chapter 4. Workflow: Code Style
Chapter 5. Data Tidying
Chapter 6. Workflow: Scripts and Projects
Chapter 7. Data Import
Chapter 8. Workflow: Getting Help
Part II. Visualize
Chapter 9. Layers
Chapter 10. Exploratory Data Analysis
Chapter 11. Communication
Part III. Transform
Chapter 12. Logical Vectors
Chapter 13. Numbers
Chapter 14. Strings
Chapter 15. Regular Expressions
Chapter 16. Factors
Chapter 17. Dates and Times
Chapter 18. Missing Values
Chapter 19. Joins
Part IV. Import
Chapter 20. Spreadsheets
Chapter 21. Databases
Chapter 22. Arrow
Chapter 23. Hierarchical Data
Chapter 24. Web Scraping
Part V. Program
Chapter 25. Functions
Chapter 26. Iteration
Chapter 27. A Field Guide to Base R
Part VI. Communicate
Chapter 28. Quarto
Chapter 29. Quarto Formats
Hadley Wickham is Chief Scientist at RStudio and a member of the R Foundation. He builds tools (both computational and cognitive) that make data science easier, faster, and more fun. His work includes packages for data science (ggplot2, dplyr, tidyr), data ingest (readr, readxl, haven), and principled software development (roxygen2, testthat, devtools). He is also a writer, educator, and frequent speaker promoting the use of R for data science. Learn more on his homepage, http://hadley.nz.
Mine Çetinkaya-Rundel is Professor of the Practice and the Director of Undergraduate Studies at the Department of Statistical Science and an affiliated faculty in the Computational Media, Arts, and Cultures program at Duke University as well as Educator at RStudio. Mine works on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education. At RStudio, Mine's work focuses primarily on education for open-source R packages as well as building resources and tools for educators teaching statistics and data science with R and RStudio. Mine has authored four undergraduate statistics textbooks as part of the OpenIntro projects, teaches the popular MOOC Statistics with R on Coursera and is the developer and maintainer of Data Science in a Box. Mine is a Fellow of the ASA and an Elected Member of the ISI as well as the recipient of the 2021 Robert V. Hogg Award For Excellence in Teaching Introductory Statistics, the 2018 Harvard Pickard Award, and the 2016 ASA Waller Education Award.
Garrett Grolemund is the author of Hands-On Programming with R and co-author of R for Data Science and R Markdown: The Definitive Guide. He is Director of Learning at RStudio and holds a Ph.D. in Statistics, but specializes in teaching. He’s taught people how to use R at over 50 government agencies, small businesses, and multi-billion dollar global companies; and he’s designed RStudio’s training materials for R, Shiny, R Markdown and more. Garrett wrote the popular lubridate package for dates and times in R and creates the RStudio cheat sheets.