首页> 美国卫生研究院文献>other >Summary measures of agreement and association between many raters’ ordinal classifications
【2h】

Summary measures of agreement and association between many raters’ ordinal classifications

机译:许多评估者的序数分类之间的协议和关联性的摘要度量

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Interpretation of screening tests such as mammograms usually require a radiologist’s subjective visual assessment of images, often resulting in substantial discrepancies between radiologists’ classifications of subjects’ test results. In clinical screening studies to assess the strength of agreement between experts, multiple raters are often recruited to assess subjects’ test results using an ordinal classification scale. However, using traditional measures of agreement in some studies is challenging due to the presence of many raters, the use of an ordinal classification scale and unbalanced data. We assess and compare the performances of existing measures of agreement and association as well as a newly developed model-based measure of agreement to three large-scale clinical screening studies involving many raters’ ordinal classifications. We also conduct a simulation study to demonstrate the key properties of the summary measures. The assessment of agreement and association varied according to the choice of summary measure. Some measures were influenced by the underlying prevalence of disease and raters’ marginal distributions and/or were limited in use to balanced data sets where every rater classifies every subject. Our simulation study indicated that popular measures of agreement and association are prone to underlying disease prevalence. Model-based measures provide a flexible approach for calculating agreement and association and are robust to missing and unbalanced data as well as the underlying disease prevalence.
机译:筛查检查(如乳房X线照片)的解释通常需要放射科医生对图像进行主观视觉评估,这通常会导致放射科医生对受试者的检查结果的分类存在实质性差异。在临床筛查研究中,以评估专家之间的共识强度,通常会使用序数分类量表来招募多个评估者来评估受试者的测试结果。但是,由于许多评估者的存在,使用序数分类量表和不平衡的数据,在某些研究中使用传统的一致性度量方法具有挑战性。我们评估并比较了现有协议和关联性度量以及新开发的基于模型的协议性度量的性能,并将其与涉及许多评估者序数分类的三项大规模临床筛选研究进行了比较。我们还进行了模拟研究,以证明简易措施的关键特性。协议和关联的评估根据汇总度量的选择而有所不同。一些衡量标准受疾病的基本患病率和评估者的边际分布的影响,并且/或者仅限于使用平衡的数据集,其中每个评估者对每个受试者进行分类。我们的模拟研究表明,共识和关联的流行措施很容易导致潜在的疾病流行。基于模型的度量提供了一种用于计算一致性和关联性的灵活方法,并且对于丢失和不平衡的数据以及潜在的疾病患病率具有鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号