首页> 外文期刊>Human-Machine Systems, IEEE Transactions on >Leveraging Pattern Recognition Consistency Estimation for Crowdsourcing Data Analysis
【24h】

Leveraging Pattern Recognition Consistency Estimation for Crowdsourcing Data Analysis

机译:利用模式识别一致性估计进行众包数据分析

获取原文
获取原文并翻译 | 示例
       

摘要

Crowdsourcing is an effective method for analyzing large scientific databases. However, data annotation relies on untrained volunteers, making it difficult to control the quality of the annotation. Here, we propose a method to estimate the consistency of the annotations of human classifiers in citizen science projects. Since the performance of supervised machine learning systems decreases as the level of noise in the data increases, the method is able to rank human annotators by the consistency with which they annotate. Because the method uses the accuracy of an automatic classifier trained with these samples, it does not require ground truth or data annotated by other citizen scientists. The method allows reducing the number of annotations required for each sample by identifying the most efficient data annotators, as well as improving the overall quality of the data by giving higher weights to the classifications of the more consistent data annotators. The proposed method can also be used for improving the citizen science user experience by providing feedback in real time. Experimental results using a large citizen science project— —and a subset of over data annotations made by 4000 citizen scientists show Pearson correlation of 0.966 between the quality estimation provided by the method and the actual performance of the data annotators. The method also demonstrated efficacy in improving the performance of statistical consensus methods.
机译:众包是分析大型科学数据库的有效方法。但是,数据注释依赖于未经培训的志愿者,因此很难控制注释的质量。在这里,我们提出了一种方法来评估公民科学项目中人类分类器注释的一致性。由于受监督的机器学习系统的性能随着数据中噪声水平的增加而降低,因此该方法能够通过人工注释者对其进行注释的一致性来对其进行排名。由于该方法使用经过这些样本训练的自动分类器的准确性,因此不需要地面事实或其他公民科学家注释的数据。该方法允许通过识别最有效的数据注释器来减少每个样本所需的注释数量,以及通过为更一致的数据注释器的分类赋予更高的权重来改善数据的整体质量。所提出的方法还可以用于通过实时提供反馈来改善公民科学的用户体验。使用大型公民科学项目进行的实验结果以及4000名公民科学家进行的过度数据注释的子集显示,该方法提供的质量估计与数据注释器的实际性能之间的皮尔森相关性为0.966。该方法还证明了改善统计共识方法性能的功效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号