Analyzing and Interpreting Data From Likert-Type Scales

Gail M.; Anthony R.

摘要

Likert-type scales are frequently used in medical education and medical education research. Common uses include end-of-rotation trainee feedback, faculty evaluations of trainees, and assessment of performance after an educational intervention. A sizable percentage of the educational research manuscripts submitted to the Journal of Graduate Medical Education employ a Likert scale for part or all of the outcome assessments. Thus, understanding the interpretation and analysis of data derived from Likert scales is imperative for those working in medical education and education research. The goal of this article is to provide readers who do not have extensive statistics background with the basics needed to understand these concepts. Developed in 1932 by Rensis Likert1 to measure attitudes, the typical Likert scale is a 5- or 7-point ordinal scale used by respondents to rate the degree to which they agree or disagree with a statement (table). In an ordinal scale, responses can be rated or ranked, but the distance between responses is not measurable. Thus, the differences between “always,” “often,” and “sometimes” on a frequency response Likert scale are not necessarily equal. In other words, one cannot assume that the difference between responses is equidistant even though the numbers assigned to those responses are. This is in contrast to interval data, in which the difference between responses can be calculated and the numbers do refer to a measureable “something.” An example of interval data would be numbers of procedures done per resident: a score of 3 means the resident has conducted 3 procedures. Interestingly, with computer technology, survey designers can create continuous measure scales that do provide interval responses as an alternative to a Likert scale. The various continuous measures for pain are well-known examples of this (figure 1). View larger version (2K) FIGURE 1Continuous Measure Example Please tell us your current pain level by sliding the pointer to the appropriate point along the continuous pain scale above.;The Controversy In the medical education literature, there has been a long-standing controversy regarding whether ordinal data, converted to numbers, can be treated as interval data.2 That is, can means, standard deviations, and parametric statistics, which depend upon data that are normally distributed (figure 2), be used to analyze ordinal data? When conducting research, we measure data from a sample of the total population of interest, not from all members of the population. Parametric tests make assumptions about the underlying population from which the research data have been obtained—usually that these population data are normally distributed. Nonparametric tests do not make this assumption about the “shape” of the population from which the study data have been drawn. Nonparametric tests are less powerful than parametric tests and usually require a larger sample size (n value) to have the same power as parametric tests to find a difference between groups when a difference actually exists. Descriptive statistics, such as means and standard deviations, have unclear meanings when applied to Likert scale responses. For example, what does the average of “never” and “rarely” really mean? Does “rarely and a half” have a useful meaning?3 Furthermore, if responses are clustered at the high and low extremes, the mean may appear to be the neutral or middle response, but this may not fairly characterize the data. This clustering of extremes is common, for example, in trainee evaluations of experiences that may be very popular with one group and perceived as unnecessary by others (eg, an epidemiology course in medical school). Other non-normal distributions of response data can similarly result in a mean score that is not a helpful measure of the data's central tendency. View larger version (1K) FIGURE 2A Normal Distribution;The Bottom Line Now that many experts have weighed in on this debate, the conclusions ar

机译：利克特式量表经常用于医学教育和医学教育研究中。常见用途包括轮岗结束时的受训者反馈，对受训者的教职评估以及在教育干预后的绩效评估。提交给《研究生医学教育杂志》的教育研究手稿中，相当一部分采用了李克特量表来进行部分或全部结果评估。因此，对于从事医学教育和教育研究工作的人员，必须了解从李克特量表得出的数据的解释和分析。本文的目的是为没有广泛统计学背景的读者提供理解这些概念所需的基础知识。典型的李克特量表由Rensis Likert1在1932年开发，用于测量态度，是受访者使用的5点或7点序数表，用于评估他们对陈述的同意或不同意程度（表）。在顺序量表中，可以对响应进行评级或排序，但是响应之间的距离无法测量。因此，频率响应李克特量表上“总是”，“经常”和“有时”之间的差异不一定相等。换句话说，即使分配给那些响应的数字相等，也不能假设响应之间的差异是等距的。这与间隔数据相反，在间隔数据中，可以计算响应之间的差异，而数字确实表示可测量的“某物”。间隔数据的一个示例是每个居民执行的程序数量：得分为3表示居民已执行了3个程序。有趣的是，借助计算机技术，调查设计者可以创建连续的量表，这些量表的确提供间隔响应，以替代李克特量表。各种持续的止痛措施就是众所周知的例子（图1）。查看大图（2K）图1连续测量示例请通过将指针沿上方连续疼痛标度滑动到适当的点来告诉我们您当前的疼痛水平。争议在医学教育文献中，长期存在关于是否可以将转换为数字的序数数据视为间隔数据2。也就是说，是否可以使用依赖于正态分布数据的均值，标准差和参数统计数据（图2）来分析序数数据？在进行研究时，我们从感兴趣的总人口样本中而不是从所有人口中测量数据。参数测试对从中获得研究数据的基础人群进行了假设（通常这些人群数据呈正态分布）。非参数检验不对从中得出研究数据的总体“形状”做出此假设。非参数测试的功能不如参数测试强大，通常需要更大的样本量（n值）才能具有与参数测试相同的功效，以便在实际存在差异时发现组之间的差异。描述性统计量（例如均值和标准差）在应用于李克特量表响应时含义不明确。例如，“从不”和“很少”的平均值真正意味着什么？ “很少有一半”是否具有有用的含义？3此外，如果响应在最高和最低极限处聚集，则平均值可能看起来是中性或中度响应，但这可能无法公正地表征数据。例如，在受训者对经验的评估中，这种极端现象是很常见的，这些经验可能在一组中非常受欢迎，而在另一组中则被其他人认为是不必要的（例如，医学院的流行病学课程）。响应数据的其他非正态分布可能会类似地导致平均值得分，而该得分并不是对数据集中趋势的有用度量。查看大图（1K）图2A正态分布;底线既然许多专家都在这场辩论中占了上风，结论是

Analyzing and Interpreting Data From Likert-Type Scales

摘要

著录项

相似文献

相关主题

期刊订阅