首页> 外文期刊>Journal of medical Internet research >Improving Consensus Scoring of Crowdsourced Data Using the Rasch Model: Development and Refinement of a Diagnostic Instrument
【24h】

Improving Consensus Scoring of Crowdsourced Data Using the Rasch Model: Development and Refinement of a Diagnostic Instrument

机译:利用RASCH模型改进众包数据的共识:诊断仪器的开发和改进

获取原文
       

摘要

Background Diabetic retinopathy (DR) is a leading cause of vision loss in working age individuals worldwide. While screening is effective and cost effective, it remains underutilized, and novel methods are needed to increase detection of DR. This clinical validation study compared diagnostic gradings of retinal fundus photographs provided by volunteers on the Amazon Mechanical Turk (AMT) crowdsourcing marketplace with expert-provided gold-standard grading and explored whether determination of the consensus of crowdsourced classifications could be improved beyond a simple majority vote (MV) using regression methods. Objective The aim of our study was to determine whether regression methods could be used to improve the consensus grading of data collected by crowdsourcing. Methods A total of 1200 retinal images of individuals with diabetes mellitus from the Messidor public dataset were posted to AMT. Eligible crowdsourcing workers had at least 500 previously approved tasks with an approval rating of 99% across their prior submitted work. A total of 10 workers were recruited to classify each image as normal or abnormal. If half or more workers judged the image to be abnormal, the MV consensus grade was recorded as abnormal. Rasch analysis was then used to calculate worker ability scores in a random 50% training set, which were then used as weights in a regression model in the remaining 50% test set to determine if a more accurate consensus could be devised. Outcomes of interest were the percent correctly classified images, sensitivity, specificity, and area under the receiver operating characteristic (AUROC) for the consensus grade as compared with the expert grading provided with the dataset. Results Using MV grading, the consensus was correct in 75.5% of images (906/1200), with 75.5% sensitivity, 75.5% specificity, and an AUROC of 0.75 (95% CI 0.73-0.78). A logistic regression model using Rasch-weighted individual scores generated an AUROC of 0.91 (95% CI 0.88-0.93) compared with 0.89 (95% CI 0.86-92) for a model using unweighted scores (chi-square P value<.001). Setting a diagnostic cut-point to optimize sensitivity at 90%, 77.5% (465/600) were graded correctly, with 90.3% sensitivity, 68.5% specificity, and an AUROC of 0.79 (95% CI 0.76-0.83). Conclusions Crowdsourced interpretations of retinal images provide rapid and accurate results as compared with a gold-standard grading. Creating a logistic regression model using Rasch analysis to weight crowdsourced classifications by worker ability improves accuracy of aggregated grades as compared with simple majority vote.
机译:背景技术糖尿病视网膜病变(DR)是全球工作年龄个体视力丧失的主要原因。在筛选是有效和成本效益的同时,它仍然未充分利用,并且需要新的方法来增加DR的检测。这种临床验证研究比较了亚马逊机械土耳其人(AMT)携带的志愿者提供的视网膜眼底照片的诊断课程,专家提供的金标准分级,并探讨了众包分类的共识是否可以改善超出简单的多数票(MV)使用回归方法。客观我们的研究目的是确定回归方法是否可用于改善众包收集的数据的共识分级。方法将来自Messidor公共数据集的糖尿病患有糖尿病患者的1200个视网膜图像张贴到AMT。符合条件的众群工人在先前提交的工作中至少有500项以前批准的任务,批准评分为99%。招募了10名工人,将每个图像分类为正常或异常。如果一半或更多工人判断图像异常,则将MV共识等级记录为异常。然后使用Rasch分析来计算随机50%训练集中的工人能力分数,然后在剩余的50%测试集中用作回归模型中的重量,以确定是否可以设计更准确的共识。与与数据集提供的专家分级相比,接收器操作特征(AUROC)下的正确分类图像,灵敏度,特异性和面积的百分比是正确分类的图像,灵敏度,特异性和面积。结果采用MV分级,在75.5%的图像(906/1200)中的共识是正确的,灵敏度为75.5%,特异性75.5%,Auroc为0.75(95%CI 0.73-0.78)。使用RASCH加权个体分数的逻辑回归模型产生0.91(95%CI 0.88-0.93)的Auroc,而使用未加权分数的模型(Chi-Square P值<.001)为0.89(95%CI 0.86-92) 。设置诊断切割点以优化90%,77.5%(465/600)的敏感性正确,灵敏度为90.3%,特异性为68.5%,均为0.79(95%CI 0.76-0.83)。结论与黄金标准分级相比,视网膜图像的众包解释提供了快速和准确的结果。使用RASCH分析创建Logistic回归模型,通过工人能力来重量众包分类,提高了汇总等级的准确性,与简单的多数票相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号