3.5 Data wrangling with the tidyverse
First of all, we need to load the tidyverse that includes a suite of packages such as dplyr, tidyr, or ggplot2. If we are not loading the tidyverse or any of its packages, we will need to upload the package magrittr to use the pipe operator.
library(tidyverse)
We are going to use the simulated data set affective.dis
to perform several operations that will end with the estimation of two descriptive statistics (M and SD) of the variables depression and life.satis for the female participants only and split by the three experimental conditions of the variable treatment.
First, the data set affective.dis
will be piped to the first operation: to select the variables id, treatment, sex, depression, and life.satis. Second, the resulting data set will be piped to the second operation: to filter a subset of rows based on the logical condition sex == 'female'
(i.e., selecting only the rows/participants that are females). Third, we will split the following operations by the three experimental groups of the variable treatment. Last, we will compute the means and standard deviations of the variables depression and life.satis.
%>%
affective.dis select(id, treatment, sex, depression, life.satis) %>%
filter(sex == 'female') %>%
group_by(treatment) %>%
summarise(mean_depression = mean(depression),
sd_depression = sd(depression),
mean_life.satis = mean(life.satis),
sd_life.satis = sd(life.satis))
## # A tibble: 3 × 5
## treatment mean_depression sd_depression mean_life.satis sd_life.satis
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 18 4.65 41.5 12.5
## 2 2 9.83 3.06 41.7 9.47
## 3 3 6.5 3.51 51.2 9.90
Sometimes, we are interested in using pipes to make sequential operations in our data sets, but the final goal is not to produce an output, but to modify a data set and to keep those changes. In these cases, we will use the assignment operator to save these changes into the same R object or into a different one. For example, in the following R code, (1) we will drop the variable year because it is a constant, (2) we will transform the variable treatment into a factor—assigning labels to its three experimental conditions, (3) we will move the variables sex and age to the second and third column respectively and the variable satis after the variable life.satis, (4) instead of overwriting the new data set on the former R object affective.dis
, we will assign the new data set to the R object affective.dis2
, (5) we will rename the variable friends to convey that the information stored in that vector column is about the number of friends (n.friends) that each participant has, and (6) we will reorder the rows of the new data set from the first (id = 1
) to the last (id = 36
) participant.
<- affective.dis %>%
affective.dis2 select(-year) %>%
mutate(treatment = factor(treatment,
labels = c('Exercise',
'Mindfulness',
'CBT'))) %>%
relocate(sex, age, .after = id) %>%
relocate(satis, .after = life.satis) %>%
rename(n.friends = friends) %>%
arrange(id)
affective.dis2## id sex age treatment depression life.satis satis n.friends alone
## 1 1 male 23 Exercise 22 53.50930 High 0 0
## 2 2 female 19 Exercise 19 42.10586 Medium 4 0
## 3 3 male 24 Exercise 14 22.73856 Low 2 1
## 4 4 female 21 Exercise 23 18.42421 Low 11 0
## 5 5 male 20 Exercise 16 68.70935 High 2 1
## 6 6 female 21 Exercise 11 40.22030 Medium 0 1
## 7 7 male 30 Exercise 29 32.00596 Medium 9 0
## 8 8 female 25 Exercise 19 44.29188 Medium 11 1
## 9 9 male 30 Exercise 17 50.12402 High 19 1
## 10 10 female 19 Exercise 14 54.88683 High 18 1
## 11 11 male 27 Exercise 20 40.08320 Medium 18 1
## 12 12 female 26 Exercise 22 48.86415 High 13 0
## 13 13 male 28 Mindfulness 11 31.99603 Medium 15 0
## 14 14 female 26 Mindfulness 8 29.28087 Medium 1 0
## 15 15 male 29 Mindfulness 5 64.95737 High 10 0
## 16 16 female 22 Mindfulness 7 52.87438 High 3 1
## 17 17 male 19 Mindfulness 13 38.76411 Medium 10 1
## 18 18 female 29 Mindfulness 10 31.53965 Medium 5 0
## 19 19 male 30 Mindfulness 9 39.57584 Medium 19 0
## 20 20 female 30 Mindfulness 7 49.12089 High 9 1
## 21 21 male 25 Mindfulness 16 61.50143 High 12 1
## 22 22 female 30 Mindfulness 13 44.87210 Medium 3 0
## 23 23 male 26 Mindfulness 11 49.37174 High 19 0
## 24 24 female 23 Mindfulness 14 42.72718 Medium 19 0
## 25 25 male 24 CBT 8 40.50052 Medium 16 0
## 26 26 female 24 CBT 12 55.23754 High 2 0
## 27 27 male 30 CBT 7 50.43363 High 7 0
## 28 28 female 29 CBT 4 37.22818 Medium 12 0
## 29 29 male 19 CBT 9 58.75117 High 2 0
## 30 30 female 25 CBT 7 52.71514 High 2 0
## 31 31 male 24 CBT 11 37.55278 Medium 15 0
## 32 32 female 29 CBT 3 58.79335 High 12 0
## 33 33 male 30 CBT 5 19.83120 Low 7 0
## 34 34 female 18 CBT 9 62.02641 High 10 0
## 35 35 male 21 CBT 14 19.23629 Low 16 0
## 36 36 female 18 CBT 4 41.11086 Medium 4 0