10.1 Categorical data

10.1.1 Binomial and multinomial data

Binomial and multinomial data (also called binary and nominal data) include the property of identity. For each case and observation that we will measure, we will assign one value that will identify that case or observation with only one category (e.g., Helping a person in distress versus Ignoring a person in distress).

The set of categories has to be exhaustive, including all the possible categories. For example, if we decide to classify bystanders as individuals helping or ignoring someone in distress, we might find some instrumental behaviors that are difficult to classify as pro-social and anti-social. Consequently, we must include an additional category that could capture other behaviors (Other behaviors). Similarly, if we have a large number of categories (e.g., political parties in a general election), we could include the main ones (e.g., Conservative, Labour, and Liberal Democrats) and collapse the remaining categories under a general category (e.g., Other political parties).

Because the main property of this measurement level is to belong to one single category, the numeric value that we assign to each category is arbitrary. For instance, we could assign a 0 to the category Fail of the binary variable Grades and a 1 to the category Pass as it is easier to recall the labels of this variable and to interpret the outputs of the model. However, assigning a 10 to Fail and a 3 to Pass is also possible because the values 10 and 3 are different.

10.1.2 Ordinal data

Ordinal data include the property of identity and the property of order. When we are measuring observations or cases that could be ordered, we will assign values to the ordered categories that will imply a higher or lower rank. The most basic ordinal variable includes three categories (e.g., Against, Undecided, and In favor). Some ordinal variables are bipolar (i.e., two poles or extremes; e.g., Bad, Neutral, and Good), whereas others are unipolar (i.e., one pole or extreme; e.g., No income, Annual gross salary lower than £25,000, Annual gross salary between £25,000 and £50,000, and Annual gross salary higher than £50,000).

In psychology, Likert scales are very popular ordinal variables. We assume a monotonic increase in the ordered categories and provide meaningful anchoring points with their associated labels (e.g., Strongly disagree, Disgree, Slightly disagree, Neither disagree/agree, Strongly agree, Agree, and Slightly agree). We usually assign a 0 or 1 to the first category and increase by one point the values assigned to the rest of the ordered categories. Psychologists tend to assume that Likert scales with 5 or more anchoring points could be treated as quasi-interval.

TODO!