3.3 Renaming, recoding, and sorting data
3.3.1 Renaming
To rename variables, we will use the indexing approach (i.e., square brackets) with the function names(). The assignment operator will assign the new names to the variables of our data set. For example, to rename y (the number of friends) and use a more appropriate name (i.e., friends), we will assign the new variable name to the 9th column of the data set (i.e., y).
names(affective.dis) [9] <- 'friends'
head(affective.dis)
## id treatment year sex age depression life.satis alone friends
## 1 1 1 2020 male 23 22 53.50930 0 0
## 2 2 1 2020 female 19 19 42.10586 0 4
## 3 3 1 2020 male 24 14 22.73856 1 2
## 4 4 1 2020 female 21 23 18.42421 0 11
## 5 5 1 2020 male 20 16 68.70935 1 2
## 6 6 1 2020 female 21 11 40.22030 1 03.3.2 Recoding
We can recode variables in different columns (i.e., creating a new variable with the new codes and keep the old variable) or in the same column (i.e., overwriting the existing variable). In the following example, we are creating a new variable called satis. This new ordinal variable displays three ordered bands of the scale Satisfaction with life (e.g., Low, Medium, High) with cut-off points set at 26 and 47. We have to recode the values of life.satis into the new variable satis using conditional subsetting with logical operators.
affective.dis$satis [affective.dis$life.satis < 26] <- 'Low'
affective.dis$satis [affective.dis$life.satis >= 26 &
affective.dis$life.satis < 47] <- 'Medium'
affective.dis$satis [affective.dis$life.satis >= 47] <- 'High'
affective.dis <- affective.dis[c(1:7, 10, 9, 8)]
head(affective.dis)
## id treatment year sex age depression life.satis satis friends alone
## 1 1 1 2020 male 23 22 53.50930 High 0 0
## 2 2 1 2020 female 19 19 42.10586 Medium 4 0
## 3 3 1 2020 male 24 14 22.73856 Low 2 1
## 4 4 1 2020 female 21 23 18.42421 Low 11 0
## 5 5 1 2020 male 20 16 68.70935 High 2 1
## 6 6 1 2020 female 21 11 40.22030 Medium 0 13.3.3 Sorting
If we are interested in sorting our data set for a visual inspection we will use the function order(). The argument decreasing set as FALSE shows the lowest values first, whereas the argument decreasing set as TRUE displays the highest values first.
affective.dis <- affective.dis[order(affective.dis$depression,
decreasing = T), ]
head(affective.dis)
## id treatment year sex age depression life.satis satis friends alone
## 7 7 1 2020 male 30 29 32.00596 Medium 9 0
## 4 4 1 2020 female 21 23 18.42421 Low 11 0
## 1 1 1 2020 male 23 22 53.50930 High 0 0
## 12 12 1 2020 female 26 22 48.86415 High 13 0
## 11 11 1 2020 male 27 20 40.08320 Medium 18 1
## 2 2 1 2020 female 19 19 42.10586 Medium 4 0We might be also interested in inspecting depression scores as a function of two or more variables. For instance, to inspect depression by age groups, we will include two arguments in the function order(). First, we will order the observations by age. Then, within every year, we will order our observations by depression.
affective.dis <- affective.dis[order(affective.dis$age,
affective.dis$depression), ]
head(affective.dis)
## id treatment year sex age depression life.satis satis friends alone
## 36 36 3 2020 female 18 4 41.11086 Medium 4 0
## 34 34 3 2020 female 18 9 62.02641 High 10 0
## 29 29 3 2020 male 19 9 58.75117 High 2 0
## 17 17 2 2020 male 19 13 38.76411 Medium 10 1
## 10 10 1 2020 female 19 14 54.88683 High 18 1
## 2 2 1 2020 female 19 19 42.10586 Medium 4 0When we use the function order(), we change the original order of the rows of our data set. It is always convenient to restore the original order by using the variable id (1:n). Remember to leave a blank space after the comma located within the square brackets (i.e., [rows , columns]) to sort the rows by id.
affective.dis <- affective.dis[order(affective.dis$id), ]