3.5 Data wrangling with the tidyverse

First of all, we need to load the tidyverse that includes a suite of packages such as dplyr, tidyr, or ggplot2. If we are not loading the tidyverse or any of its packages, we will need to upload the package magrittr to use the pipe operator.


library(tidyverse)

We are going to use the simulated data set affective.dis to perform several operations that will end with the estimation of two descriptive statistics (M and SD) of the variables depression and life.satis for the female participants only and split by the three experimental conditions of the variable treatment.

First, the data set affective.dis will be piped to the first operation: to select the variables id, treatment, sex, depression, and life.satis. Second, the resulting data set will be piped to the second operation: to filter a subset of rows based on the logical condition sex == 'female' (i.e., selecting only the rows/participants that are females). Third, we will split the following operations by the three experimental groups of the variable treatment. Last, we will compute the means and standard deviations of the variables depression and life.satis.


affective.dis %>% 
  select(id, treatment, sex, depression, life.satis) %>% 
  filter(sex == 'female') %>%
  group_by(treatment) %>% 
  summarise(mean_depression = mean(depression),
            sd_depression = sd(depression),
            mean_life.satis = mean(life.satis),
            sd_life.satis = sd(life.satis))
## # A tibble: 3 × 5
##   treatment mean_depression sd_depression mean_life.satis sd_life.satis
##       <int>           <dbl>         <dbl>           <dbl>         <dbl>
## 1         1           18             4.65            41.5         12.5 
## 2         2            9.83          3.06            41.7          9.47
## 3         3            6.5           3.51            51.2          9.90

Sometimes, we are interested in using pipes to make sequential operations in our data sets, but the final goal is not to produce an output, but to modify a data set and to keep those changes. In these cases, we will use the assignment operator to save these changes into the same R object or into a different one. For example, in the following R code, (1) we will drop the variable year because it is a constant, (2) we will transform the variable treatment into a factor—assigning labels to its three experimental conditions, (3) we will move the variables sex and age to the second and third column respectively and the variable satis after the variable life.satis, (4) instead of overwriting the new data set on the former R object affective.dis, we will assign the new data set to the R object affective.dis2, (5) we will rename the variable friends to convey that the information stored in that vector column is about the number of friends (n.friends) that each participant has, and (6) we will reorder the rows of the new data set from the first (id = 1) to the last (id = 36) participant.


affective.dis2 <- affective.dis %>% 
  select(-year) %>% 
  mutate(treatment = factor(treatment,
                            labels = c('Exercise',
                                      'Mindfulness',
                                      'CBT'))) %>%
  relocate(sex, age, .after = id) %>%
  relocate(satis, .after = life.satis) %>%
  rename(n.friends = friends) %>% 
  arrange(id)

affective.dis2
##    id    sex age   treatment depression life.satis  satis n.friends alone
## 1   1   male  23    Exercise         22   53.50930   High         0     0
## 2   2 female  19    Exercise         19   42.10586 Medium         4     0
## 3   3   male  24    Exercise         14   22.73856    Low         2     1
## 4   4 female  21    Exercise         23   18.42421    Low        11     0
## 5   5   male  20    Exercise         16   68.70935   High         2     1
## 6   6 female  21    Exercise         11   40.22030 Medium         0     1
## 7   7   male  30    Exercise         29   32.00596 Medium         9     0
## 8   8 female  25    Exercise         19   44.29188 Medium        11     1
## 9   9   male  30    Exercise         17   50.12402   High        19     1
## 10 10 female  19    Exercise         14   54.88683   High        18     1
## 11 11   male  27    Exercise         20   40.08320 Medium        18     1
## 12 12 female  26    Exercise         22   48.86415   High        13     0
## 13 13   male  28 Mindfulness         11   31.99603 Medium        15     0
## 14 14 female  26 Mindfulness          8   29.28087 Medium         1     0
## 15 15   male  29 Mindfulness          5   64.95737   High        10     0
## 16 16 female  22 Mindfulness          7   52.87438   High         3     1
## 17 17   male  19 Mindfulness         13   38.76411 Medium        10     1
## 18 18 female  29 Mindfulness         10   31.53965 Medium         5     0
## 19 19   male  30 Mindfulness          9   39.57584 Medium        19     0
## 20 20 female  30 Mindfulness          7   49.12089   High         9     1
## 21 21   male  25 Mindfulness         16   61.50143   High        12     1
## 22 22 female  30 Mindfulness         13   44.87210 Medium         3     0
## 23 23   male  26 Mindfulness         11   49.37174   High        19     0
## 24 24 female  23 Mindfulness         14   42.72718 Medium        19     0
## 25 25   male  24         CBT          8   40.50052 Medium        16     0
## 26 26 female  24         CBT         12   55.23754   High         2     0
## 27 27   male  30         CBT          7   50.43363   High         7     0
## 28 28 female  29         CBT          4   37.22818 Medium        12     0
## 29 29   male  19         CBT          9   58.75117   High         2     0
## 30 30 female  25         CBT          7   52.71514   High         2     0
## 31 31   male  24         CBT         11   37.55278 Medium        15     0
## 32 32 female  29         CBT          3   58.79335   High        12     0
## 33 33   male  30         CBT          5   19.83120    Low         7     0
## 34 34 female  18         CBT          9   62.02641   High        10     0
## 35 35   male  21         CBT         14   19.23629    Low        16     0
## 36 36 female  18         CBT          4   41.11086 Medium         4     0