3.4 The tidyverse

The tidyverse is a collection of packages developed by a team of programmers led by Hadley Wickham that share the same philosophy and grammar (Wickham & Grolemund, 2017). The main goal of the tidyverse is to favor an easy and comprehensive approach to data science by solving the challenges that programmers usually face when coding in R (Wickham et al., 2019). Some of the core packages of the tidyverse (e.g., readr, tidyr, dplyr, ggplot2) provide clear insights on their functionality as they will provide solutions to import, tidy, manipulate, and visualize data.

3.4.1 The pipe (%>%)

The “pipe” operator is used to connect multiple verb actions into a pipeline. In sum, we can chain or pipe functions to perform a sequence of operations (e.g., data wrangling). The package magrittr offers a set of operators which make your R code more readable by:

  • structuring sequences of data operations left-to-right (as opposed to from the inside and out).
  • avoiding nested function calls.
  • minimizing the need for local variables and function definitions.
  • making it easy to add steps anywhere in the sequence of operations.

The operators pipe their left-hand side values forward into expressions that appear on the right-hand side; i.e., one can replace f(x) with x %>% f(), where %>% is the (main) pipe-operator. The flow of the R code will be as follows:

data %>% function 1 %>% function 2 %>% function 3 ...

Keyboard shortcut for the pipe operator

    • Windows: Press Ctrl + Shift + M
    • MacOS: Press Cmd + Shift + M

3.4.2 The package dplyr

The package dplyr was developed by Hadley Wickham and colleagues. It is an optimized and distilled version of an old package named plyr. The package dplyr does not provide any “new” functionality to R per se. Everything dplyr does could already be done with base R, but it greatly simplifies existing functionality in R (Wickham et al., 2022).

One important contribution of the package dplyr is that it provides a “grammar” (in particular, verbs) for data manipulation and for operating on data frames. Other people can understand what we are doing to a data frame with this "grammar." This new approach is very useful because it provides an abstraction for data manipulation that it did not exist before. Another useful contribution of the package dplyr is that their functions are very fast, because many key operations are coded in C++.

dplyr verbs

    • select: It returns a subset of the columns of a data frame, using a flexible notation
    • filter: It extracts a subset of rows from a data frame based on logical conditions
    • relocate: It reorders the columns of a data frame
    • arrange: It reorders the rows of a data frame
    • rename: It renames variables in a data frame
    • mutate: It adds new variables/columns or transform existing variables
    • summarize: It generates a summary statistics of different variables in the data frame

3.4.3 The package tidyr

The package tidyr was also developed by Hadley Wickham and colleagues. The package tidyr was created to tidy data; i.e., every vector column represents a variable, every row includes information of one participant/observation, and cells include a single value. The package tidyr is widely used to transform data frames from wide to long format and viceversa. This process involving the reshaping of data relies on pivoting the data set to collapse several columns into fewer columns (pivot_longer()) to generate a long-format data set, or to expand a few columns into more columns (pivot_wider()) to create a wide-format data set.

Let's select the first four observations of the simulated data set affective.dis, a wide-format data set in which participant's responses were entered in a single row and multiple columns. We will use the function slice_sample() from the package dplyr to subset n random rows, rather than using the function filter() that requires setting conditions to keep some rows while dropping others (e.g., filter(sex == 'female')).


affective.wide1 <- affective.dis %>% 
  slice_sample(n = 5)

affective.wide1
##   id treatment year    sex age depression life.satis  satis friends alone
## 1  7         1 2020   male  30         29   32.00596 Medium       9     0
## 2 22         2 2020 female  30         13   44.87210 Medium       3     0
## 3 23         2 2020   male  26         11   49.37174   High      19     0
## 4 29         3 2020   male  19          9   58.75117   High       2     0
## 5 12         1 2020 female  26         22   48.86415   High      13     0

XXXXXX

TODO!!

XXXXXX