Organize, Test, Document, and Share Your Code
Michael Hausenblas, Stefan Schimanski
R#
RStudio#
GitHub#
Turn your R code into packages that others can easily install and use. With this fully updated edition, developers and data scientists will learn how to bundle reusable R functions, sample data, and documentation together by applying the package development philosophy used by the team that maintains the "tidyverse" suite of packages. In the process, you'll learn how to automate common development tasks using a set of R packages, including devtools, usethis, testthat, and roxygen2.
Authors Hadley Wickham and Jennifer Bryan from Posit (formerly known as RStudio) help you create packages quickly, then teach you how to get better over time. You'll be able to focus on what you want your package to do as you progressively develop greater mastery of the structure of a package.
With this book, you will:
Table of Contents
Part I. Getting Started
Chapter 1. The Whole Game
Chapter 2. System Setup
Chapter 3. Package Structure and State
Chapter 4. Fundamental Development Workflows
Chapter 5. The Package Within
Part II. Package Components
Chapter 6. R Code
Chapter 7. Data
Chapter 8. Other Components
Part Ill. Package Metadata
Chapter 9. DESCRIPTION
Chapter 10. Dependencies: Mindset and Background
Chapter 11. Dependencies: In Practice
Chapter 12. Licensing
Part IV. Testing
Chapter 13. Testing Basics
Chapter 14. Designing Your Test Suite
Chapter 15. Advanced Testing Techniques
Part IV. Testing
Chapter 13. Testing Basics
Chapter 14. Designing Your Test Suite
Chapter 15. Advanced Testing Techniques
Part V. Documentation
Chapter 16. Function Documentation
Chapter 17. Vignettes
Chapter 18. Other Markdown Files
Chapter 19. Website
Part VI. Maintenance and Distribution
Chapter 20. Software Development Practices
Chapter 21 . Lifecycle
Chapter 22. Releasing to CRAN
Welcome to R Packages by Hadley Wickham and Jennifer Bryan. Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. In this book you’ll learn how to turn your code into packages that others can easily download and use. Writing a package can seem overwhelming at first, so start with the basics and improve it over time. It doesn’t matter if your first version isn’t perfect as long as the next version is better.
Introduction
In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests and is easy to share with others. As of March 2023, there were over 19,000 packages available on the Comprehensive R Archive Network, or CRAN, the public clearinghouse for R packages. This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem you’re working on, and you can benefit from their work by downloading their package.
If you’re reading this book, you already know how to work with packages in the following ways:
The goal of this book is to teach you how to develop packages so that you can write your own, not just use other people’s. Why write a package? One compelling reason is that you have code that you want to share with others. Bundling your code into a package makes it easy for other people to use it, because like you, they already know how to use packages. If your code is in a package, any R user can easily download it, install it, and learn how to use it.
But packages are useful even if you never share your code. As Hilary Parker says in her introduction to packages: “Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time.” Organizing code in a package makes your life easier because packages come with conventions. For example, you put R code in R/, you put tests in tests/, and you put data in data/. These conventions are helpful because:
They save time—you don’t need to think about the best way to organize a project, you can just follow a template.
Standardized conventions lead to standardized tools—if you buy into R’s package conventions, you get many tools for free.
It’s even possible to use packages to structure your data analyses (e.g., “Packaging Data Analytical Work Reproducibly Using r (and Friends)” in The American Statistician or PeerJ Preprints), although we won’t delve deeply into that use case here.
Hadley Wickham is Chief Scientist at RStudio and a member of the R Foundation. He builds tools (both computational and cognitive) that make data science easier, faster, and more fun. His work includes packages for data science (ggplot2, dplyr, tidyr), data ingest (readr, readxl, haven), and principled software development (roxygen2, testthat, devtools). He is also a writer, educator, and frequent speaker promoting the use of R for data science. Learn more on his homepage, https://hadley.nz/.
Jennifer Bryan is a Software Engineer at RStudio and a member of the R Foundation. She part of the tidyverse team that collectively maintains >150 R packages. Jennifer maintains packages for importing tabular data (readxl, googlesheets4, readr, vroom), working with Google APIs (googledrive, gargle, gmailr), and simplifying development workflows (reprex, usethis, devtools). In her first career, Jennifer was an Associate Professor of Statistics at the University of British Columbia, where she created courses and programs in what we now know as data science. Learn more on her homepage, https://jennybryan.org/.