3.3 Renaming, recoding, and sorting data
3.3.1 Renaming
To rename variables, we will use the indexing approach (i.e., square brackets) with the function names()
. The assignment operator will assign the new names to the variables of our data set. For example, to rename y (the number of friends) and use a more appropriate name (i.e., friends), we will assign the new variable name to the 9th column of the data set (i.e., y).
names(affective.dis) [9] <- 'friends'
head(affective.dis)
## id treatment year sex age depression life.satis alone friends
## 1 1 1 2020 male 23 22 53.50930 0 0
## 2 2 1 2020 female 19 19 42.10586 0 4
## 3 3 1 2020 male 24 14 22.73856 1 2
## 4 4 1 2020 female 21 23 18.42421 0 11
## 5 5 1 2020 male 20 16 68.70935 1 2
## 6 6 1 2020 female 21 11 40.22030 1 0
3.3.2 Recoding
We can recode variables in different columns (i.e., creating a new variable with the new codes and keep the old variable) or in the same column (i.e., overwriting the existing variable). In the following example, we are creating a new variable called satis. This new ordinal variable displays three ordered bands of the scale Satisfaction with life (e.g., Low
, Medium
, High
) with cut-off points set at 26
and 47
. We have to recode the values of life.satis into the new variable satis using conditional subsetting with logical operators.
$satis [affective.dis$life.satis < 26] <- 'Low'
affective.dis$satis [affective.dis$life.satis >= 26 &
affective.dis$life.satis < 47] <- 'Medium'
affective.dis$satis [affective.dis$life.satis >= 47] <- 'High'
affective.dis
<- affective.dis[c(1:7, 10, 9, 8)]
affective.dis head(affective.dis)
## id treatment year sex age depression life.satis satis friends alone
## 1 1 1 2020 male 23 22 53.50930 High 0 0
## 2 2 1 2020 female 19 19 42.10586 Medium 4 0
## 3 3 1 2020 male 24 14 22.73856 Low 2 1
## 4 4 1 2020 female 21 23 18.42421 Low 11 0
## 5 5 1 2020 male 20 16 68.70935 High 2 1
## 6 6 1 2020 female 21 11 40.22030 Medium 0 1
3.3.3 Sorting
If we are interested in sorting our data set for a visual inspection we will use the function order()
. The argument decreasing
set as FALSE
shows the lowest values first, whereas the argument decreasing
set as TRUE
displays the highest values first.
<- affective.dis[order(affective.dis$depression,
affective.dis decreasing = T), ]
head(affective.dis)
## id treatment year sex age depression life.satis satis friends alone
## 7 7 1 2020 male 30 29 32.00596 Medium 9 0
## 4 4 1 2020 female 21 23 18.42421 Low 11 0
## 1 1 1 2020 male 23 22 53.50930 High 0 0
## 12 12 1 2020 female 26 22 48.86415 High 13 0
## 11 11 1 2020 male 27 20 40.08320 Medium 18 1
## 2 2 1 2020 female 19 19 42.10586 Medium 4 0
We might be also interested in inspecting depression scores as a function of two or more variables. For instance, to inspect depression by age groups, we will include two arguments in the function order()
. First, we will order the observations by age. Then, within every year, we will order our observations by depression.
<- affective.dis[order(affective.dis$age,
affective.dis $depression), ]
affective.dishead(affective.dis)
## id treatment year sex age depression life.satis satis friends alone
## 36 36 3 2020 female 18 4 41.11086 Medium 4 0
## 34 34 3 2020 female 18 9 62.02641 High 10 0
## 29 29 3 2020 male 19 9 58.75117 High 2 0
## 17 17 2 2020 male 19 13 38.76411 Medium 10 1
## 10 10 1 2020 female 19 14 54.88683 High 18 1
## 2 2 1 2020 female 19 19 42.10586 Medium 4 0
When we use the function order()
, we change the original order of the rows of our data set. It is always convenient to restore the original order by using the variable id (1:n
). Remember to leave a blank space after the comma located within the square brackets (i.e., [rows , columns]) to sort the rows by id.
<- affective.dis[order(affective.dis$id), ] affective.dis