15.1 Item Response Theory (IRT)

Item Response Theory (IRT) refers to an array of structural models that establish a formalized mathematical relationship between the response to a specific item and the level of ability/trait of the respondent (Crocker & Algina, 1986; DeVellis, 2017). IRT models assume that respondents' behaviors, when answering to one item, can be explained using one or more latent variables (i.e., unobserved traits or abilities). Consequently, respondents' scores in one ability test (e.g., a Spanish grammar test) will be a function of the underlying respondents' ability or trait (e.g., their knowledge of Spanish grammar).

IRT focuses on the properties of individual items, rather than the overall scale. Interestingly, every respondent and item can be located at a certain point of the trait (\(\theta\)). To do so, the scale has to be unidimensional (i.e., one ability/trait is enough to explain the results of the respondents and the relationships among the items). Despite this, later developments in test theory has allow for the development of multidimensional IRT (mIRT) models.

IRT's item and test statistics (e.g., parameters, test information function and its standard error) differ greatly from Classical Test Theory statistics (e.g., item-test correlation, coefficient alpha). IRT allows us to overcome most of the problems that CTT faces (e.g., shorter tests can be more reliable than longer tests). As a trade-off, IRT is less flexible than CTT regarding the model's underlying assumptions. Likewise, IRT requires a large number of respondents in order to properly estimate the items' parameters and the Test Information Function (Table 15.1).

The last interesting feature of IRT models regards to the score distributions. In CTT, interval scale properties are satisfied with normal score distributions and linearity assumptions, whereas in IRT the relationship between ability/trait level and the probability to endorse the item is monotonic (i.e., a non-linear increasing function that usually fits into exponential models).

Table 15.1: Differences between Classical Test Theory and Item Response Theory
Features CTT IRT
Model Linear Non linear (monotonic)
Level of analysis Test Item
Relationship item-trait Not specified Item Characteristic Curve (ICC)
Assumptions Weak Strong
Item statistics Point-biserial correlation Parameters \(a_{j}\), \(b_{j}\), \(c_{j}\), and Information function
Sample size N = 200 — 500 N > 800
Note. CTT = Classical Test Theory. IRT = Item Response Theory.