Chapter 3 Data Wrangling

LEARNING OUTCOMES

  • Identify the capabilities of R and RStudio's environment and appraise their functionality.
  • Distinguish between R functions, objects, and diverse data wrangling approaches.
  • Apply basic programming skills to import and organize the data using classic data wrangling approaches and the tidyverse grammar.
  • Evaluate the R code and appraise the outputs to demonstrate a satisfactory level of basic programming skills in R.

We are going to create a data frame by simulating a data set. Data frames are data structures that include vector columns (i.e., variables) containing different modes of data (e.g., numeric, factor, character). The observations, participants, or cases are usually displayed as rows. Using the assignment operator (<-) we will store the data that we will simulate into R objects that will be named accordingly to the variables they intend to simulate (e.g., id, treatment, life.satis).


id <- 1:36
treatment <- rep(1:3, each = 12)
year <- rep(2020, times = 36)
sex <- rep(c('male', 'female'), times = 18)
depression <- c(22, 19, 14, 23, 16, 11, 29, 19,
                17, 14, 20, 22, 11, 8, 5, 7, 13,
                10, 9, 7, 16, 13, 11, 14, 8, 12,
                7, 4, 9, 7, 11, 3, 5, 9, 14, 4)
life.satis <- rnorm(36, mean = 40, sd = 12.3)
alone <- rbinom(36, 1, .29)

The function data.frame() allows us to include different column vectors as arguments. The function names() shows the variables' names.



affective.dis <- data.frame(id, treatment, year, sex,
                            depression, life.satis, alone)
head(affective.dis, 8)
##   id treatment year    sex depression life.satis alone
## 1  1         1 2020   male         22   53.50930     0
## 2  2         1 2020 female         19   42.10586     0
## 3  3         1 2020   male         14   22.73856     1
## 4  4         1 2020 female         23   18.42421     0
## 5  5         1 2020   male         16   68.70935     1
## 6  6         1 2020 female         11   40.22030     1
## 7  7         1 2020   male         29   32.00596     0
## 8  8         1 2020 female         19   44.29188     1

names(affective.dis)
## [1] "id"         "treatment"  "year"       "sex"        "depression"
## [6] "life.satis" "alone"

In the following sections, we will apply basic data wrangling functions to reorganize and tidy the data frame called affective.dis (i.e., affective disorders); a data set that includes the vector columns that we have recently simulated and inspected: id, treatment, year, sex, depression, life.satis, and alone.