10.2 Dealing with categorical variables in R

Categorical data are often called factors when we assign values implying identity or order properties to character variables (e.g., Helping behavior, Grades). In R, the function factor() allows us to transform a variable into an unorder (binomial and multinomial data) or order (ordinal data) variable. The arguments levels and labels are used to set the values and labels for each category respectively.

Let's simulate the data set called help.data that includes observations of 18 male and female bystanders who decided to help or not a person in distress and who were asked the extent to which they donate money to charities. We can create the vectors first and then bind them to create a data frame with the function data.frame. We can explore the structure of the R object with the function str() to find out that the variable sex is a character vector, whereas help.behavior and donation are numeric vectors.


id <- 1:18
sex <- c('male', 'female', 'male', 'male', 'male', 'male',
         'female', 'female', 'male', 'female', 'female', 'female',
         'female', 'male', 'male', 'female', 'female', 'male')
help.behavior <- c(1, 1, 0, 0, 0, 0,
                   0, 1, 0, 1, 1, 0,
                   0, 0, 1, 1, 1, 1)
donation <- c(0, 1, 1, 0, 0, 2,
              0, 3, 1, 2, 2, 0,
              1, 0, 3, 1, 2, 3)
coop <- c(2, 1, 2, 2, 1, 3,
          1, 2, 2, 3, 1, 2,
          2, 1, 3, 3, 2, 3)

help.data <- data.frame(id, sex, help.behavior, donation, coop)
help.data
##    id    sex help.behavior donation coop
## 1   1   male             1        0    2
## 2   2 female             1        1    1
## 3   3   male             0        1    2
## 4   4   male             0        0    2
## 5   5   male             0        0    1
## 6   6   male             0        2    3
## 7   7 female             0        0    1
## 8   8 female             1        3    2
## 9   9   male             0        1    2
## 10 10 female             1        2    3
## 11 11 female             1        2    1
## 12 12 female             0        0    2
## 13 13 female             0        1    2
## 14 14   male             0        0    1
## 15 15   male             1        3    3
## 16 16 female             1        1    3
## 17 17 female             1        2    2
## 18 18   male             1        3    3

str(help.data)
## 'data.frame':    18 obs. of  5 variables:
##  $ id           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sex          : chr  "male" "female" "male" "male" ...
##  $ help.behavior: num  1 1 0 0 0 0 0 1 0 1 ...
##  $ donation     : num  0 1 1 0 0 2 0 3 1 2 ...
##  $ coop         : num  2 1 2 2 1 3 1 2 2 3 ...

Using the function mutate() from the package dplyr we can transform our three character and numeric variables into factors. For doing so, we will use the function factor() to select the variable that we want to transform into a factor as the first argument, followed by three more arguments: levels (i.e., to set the levels of the factors), labels (i.e., to name the levels), and ordered (i.e., to specify if we deal with binomial/multinomial or ordinal data).



library(dplyr)

help.data <- help.data %>%
  mutate(
    sex = factor(sex, levels = c('male', 'female'),
                 labels = c('Male', 'Female'), ordered = F),
    help.behavior = factor(help.behavior, levels = c(0, 1),
                 labels = c('No', 'Yes'), ordered = F),
    donation = factor(donation, levels = c(0, 1, 2, 3),
                 labels = c('Never', 'Sometimes', 'Often', 'Always'), ordered = T),
    coop = factor(coop, levels = c(1, 2, 3),
                 labels = c('Untrue of me', 'Neutral', 'True of me'), ordered = T))

str(help.data)
## 'data.frame':    18 obs. of  5 variables:
##  $ id           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sex          : Factor w/ 2 levels "Male","Female": 1 2 1 1 1 1 2 2 1 2 ...
##  $ help.behavior: Factor w/ 2 levels "No","Yes": 2 2 1 1 1 1 1 2 1 2 ...
##  $ donation     : Ord.factor w/ 4 levels "Never"<"Sometimes"<..: 1 2 2 1 1 3 1 4 2 3 ...
##  $ coop         : Ord.factor w/ 3 levels "Untrue of me"<..: 2 1 2 2 1 3 1 2 2 3 ...