Likert-type scales are frequently used in medical education and medical education research. Common uses include end-of-rotation trainee feedback, faculty evaluations of trainees, and assessment of performance after an educational intervention. A sizable percentage of the educational research manuscripts submitted to the Journal of Graduate Medical Education employ a Likert scale for part or all of the outcome assessments. Thus, understanding the interpretation and analysis of data derived from Likert scales is imperative for those working in medical education and education research. The goal of this article is to provide readers who do not have extensive statistics background with the basics needed to understand these concepts. Developed in 1932 by Rensis Likert1 to measure attitudes, the typical Likert scale is a 5- or 7-point ordinal scale used by respondents to rate the degree to which they agree or disagree with a statement (table). In an ordinal scale, responses can be rated or ranked, but the distance between responses is not measurable. Thus, the differences between “always,” “often,” and “sometimes” on a frequency response Likert scale are not necessarily equal. In other words, one cannot assume that the difference between responses is equidistant even though the numbers assigned to those responses are. This is in contrast to interval data, in which the difference between responses can be calculated and the numbers do refer to a measureable “something.” An example of interval data would be numbers of procedures done per resident: a score of 3 means the resident has conducted 3 procedures. Interestingly, with computer technology, survey designers can create continuous measure scales that do provide interval responses as an alternative to a Likert scale. The various continuous measures for pain are well-known examples of this (figure 1). View larger version (2K) FIGURE 1Continuous Measure Example Please tell us your current pain level by sliding the pointer to the appropriate point along the continuous pain scale above.;The Controversy In the medical education literature, there has been a long-standing controversy regarding whether ordinal data, converted to numbers, can be treated as interval data.2 That is, can means, standard deviations, and parametric statistics, which depend upon data that are normally distributed (figure 2), be used to analyze ordinal data? When conducting research, we measure data from a sample of the total population of interest, not from all members of the population. Parametric tests make assumptions about the underlying population from which the research data have been obtained—usually that these population data are normally distributed. Nonparametric tests do not make this assumption about the “shape” of the population from which the study data have been drawn. Nonparametric tests are less powerful than parametric tests and usually require a larger sample size (n value) to have the same power as parametric tests to find a difference between groups when a difference actually exists. Descriptive statistics, such as means and standard deviations, have unclear meanings when applied to Likert scale responses. For example, what does the average of “never” and “rarely” really mean? Does “rarely and a half” have a useful meaning?3 Furthermore, if responses are clustered at the high and low extremes, the mean may appear to be the neutral or middle response, but this may not fairly characterize the data. This clustering of extremes is common, for example, in trainee evaluations of experiences that may be very popular with one group and perceived as unnecessary by others (eg, an epidemiology course in medical school). Other non-normal distributions of response data can similarly result in a mean score that is not a helpful measure of the data's central tendency. View larger version (1K) FIGURE 2A Normal Distribution;The Bottom Line Now that many experts have weighed in on this debate, the conclusions ar
展开▼