Chapter 3 Data Wrangling
LEARNING OUTCOMES
- Identify the capabilities of R and RStudio's environment and appraise their functionality.
- Distinguish between R functions, objects, and diverse data wrangling approaches.
- Apply basic programming skills to import and organize the data using classic data wrangling approaches and the tidyverse grammar.
- Evaluate the R code and appraise the outputs to demonstrate a satisfactory level of basic programming skills in R.
We are going to create a data frame by simulating a data set. Data frames are data structures that include vector columns (i.e., variables) containing different modes of data (e.g., numeric, factor, character). The observations, participants, or cases are usually displayed as rows. Using the assignment operator (<-
) we will store the data that we will simulate into R objects that will be named accordingly to the variables they intend to simulate (e.g., id, treatment, life.satis).
<- 1:36
id <- rep(1:3, each = 12)
treatment <- rep(2020, times = 36)
year <- rep(c('male', 'female'), times = 18)
sex <- c(22, 19, 14, 23, 16, 11, 29, 19,
depression 17, 14, 20, 22, 11, 8, 5, 7, 13,
10, 9, 7, 16, 13, 11, 14, 8, 12,
7, 4, 9, 7, 11, 3, 5, 9, 14, 4)
<- rnorm(36, mean = 40, sd = 12.3)
life.satis <- rbinom(36, 1, .29) alone
The function data.frame()
allows us to include different column vectors as arguments. The function names()
shows the variables' names.
<- data.frame(id, treatment, year, sex,
affective.dis
depression, life.satis, alone)head(affective.dis, 8)
## id treatment year sex depression life.satis alone
## 1 1 1 2020 male 22 53.50930 0
## 2 2 1 2020 female 19 42.10586 0
## 3 3 1 2020 male 14 22.73856 1
## 4 4 1 2020 female 23 18.42421 0
## 5 5 1 2020 male 16 68.70935 1
## 6 6 1 2020 female 11 40.22030 1
## 7 7 1 2020 male 29 32.00596 0
## 8 8 1 2020 female 19 44.29188 1
names(affective.dis)
## [1] "id" "treatment" "year" "sex" "depression"
## [6] "life.satis" "alone"
In the following sections, we will apply basic data wrangling functions to reorganize and tidy the data frame called affective.dis
(i.e., affective disorders); a data set that includes the vector columns that we have recently simulated and inspected: id, treatment, year, sex, depression, life.satis, and alone.