Leveraging Pattern Recognition Consistency Estimation for Crowdsourcing Data Analysis

Lior Shamir; Derek Diamond; John Wallin

首页> 外文期刊>Human-Machine Systems, IEEE Transactions on >Leveraging Pattern Recognition Consistency Estimation for Crowdsourcing Data Analysis

【24h】

Leveraging Pattern Recognition Consistency Estimation for Crowdsourcing Data Analysis

机译：利用模式识别一致性估计进行众包数据分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Crowdsourcing is an effective method for analyzing large scientific databases. However, data annotation relies on untrained volunteers, making it difficult to control the quality of the annotation. Here, we propose a method to estimate the consistency of the annotations of human classifiers in citizen science projects. Since the performance of supervised machine learning systems decreases as the level of noise in the data increases, the method is able to rank human annotators by the consistency with which they annotate. Because the method uses the accuracy of an automatic classifier trained with these samples, it does not require ground truth or data annotated by other citizen scientists. The method allows reducing the number of annotations required for each sample by identifying the most efficient data annotators, as well as improving the overall quality of the data by giving higher weights to the classifications of the more consistent data annotators. The proposed method can also be used for improving the citizen science user experience by providing feedback in real time. Experimental results using a large citizen science project— —and a subset of over data annotations made by 4000 citizen scientists show Pearson correlation of 0.966 between the quality estimation provided by the method and the actual performance of the data annotators. The method also demonstrated efficacy in improving the performance of statistical consensus methods.

机译：众包是分析大型科学数据库的有效方法。但是，数据注释依赖于未经培训的志愿者，因此很难控制注释的质量。在这里，我们提出了一种方法来评估公民科学项目中人类分类器注释的一致性。由于受监督的机器学习系统的性能随着数据中噪声水平的增加而降低，因此该方法能够通过人工注释者对其进行注释的一致性来对其进行排名。由于该方法使用经过这些样本训练的自动分类器的准确性，因此不需要地面事实或其他公民科学家注释的数据。该方法允许通过识别最有效的数据注释器来减少每个样本所需的注释数量，以及通过为更一致的数据注释器的分类赋予更高的权重来改善数据的整体质量。所提出的方法还可以用于通过实时提供反馈来改善公民科学的用户体验。使用大型公民科学项目进行的实验结果以及4000名公民科学家进行的过度数据注释的子集显示，该方法提供的质量估计与数据注释器的实际性能之间的皮尔森相关性为0.966。该方法还证明了改善统计共识方法性能的功效。

著录项

来源
《Human-Machine Systems, IEEE Transactions on》 |2016年第3期|474-480|共7页
作者
Lior Shamir; Derek Diamond; John Wallin;
展开▼
作者单位

Department of Computer Science, Lawrence Technological University, Southfield, MI, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Big data; citizen science; crowdsourcing; data analysis;

机译：大数据;公民科学;众包;数据分析;
入库时间 2022-08-18 01:15:46

相似文献

外文文献
中文文献
专利

1. A pattern recognition based approach to consistency analysis of geophysical datasets [J] . Anish C. Turlapaty, Valentine G. Anantharaj, Nicolas H. Younan Computers & geosciences . 2010,第4期

机译：基于模式识别的地球物理数据集一致性分析方法
2. Using Exploratory Spatial Data Analysis to Leverage Social Indicator Databases: The Discovery of Interesting Patterns [J] . Luc Anselin, Sanjeev Sridharan, Susan Gholston Social Indicators Research . 2007,第2期

机译：使用探索性空间数据分析来利用社会指标数据库：有趣模式的发现
3. Variations in Practice Patterns and Consistency With Published Guidelines for Balloon Aortic and Pulmonary Valvuloplasty An Analysis of Data From the IMPACT Registry [J] . Glatz Andrew C., Kennedy Kevin F., Rome Jonathan J., JACC. Cardiovascular interventions . 2018,第6期

机译：实践模式的变化和与出版的气球主动脉和肺瓣膜成形术指南的一致性分析来自影响登记处的数据
4. Knowing your enemies: leveraging data analysis to expose phishing patterns against a major US financial institution [C] . Javier Vargas, Alejandro Correa Bahnsen, Sergio Villegas, APWG Symposium on Electronic Crime Research . 2016

机译：认识敌人：利用数据分析来揭露针对美国一家主要金融机构的网络钓鱼模式
5. Applications of Pattern Recognition Entropy (PRE) and Informatics to Data Analysis [D] . Chatterjee, Shiladitya. 2019

机译：模式识别熵（PRE）和信息学在数据分析中的应用
6. Variations in practice patterns and consistency with published guidelines for balloon aortic and pulmonary valvuloplasty: An analysis of data from the IMPACT® Registry [O] . Andrew C Glatz, Kevin F Kennedy, Jonathan J Rome, -1

机译：实践模式的变化以及与已发布的球囊主动脉和肺动脉瓣成形术指南的一致性：对IMPACT®Registry中数据的分析
7. Pattern recognition of forest canopy using the airborne hyperspectral data and multi-bands high spatial resolution satellite sensor worldview-2 data. A results comparison and accuracy estimation [O] . V. V. Kozoderov, V. D. Egorov 2019

机译：使用空中高光谱数据和多频段高空间分辨率卫星传感器WorldView-2数据的森林遮篷的模式识别。结果比较与准确性估计
8. Improving Process Monitoring Data for Nuclear Fuel Reprocessing Plants Using State Estimation and Pattern Recognition [R] . R. W. Gilchrist, J. E. Bennett, W. J. Barnett 1978

机译：利用状态估计和模式识别改进核燃料后处理装置过程监测数据

Leveraging Pattern Recognition Consistency Estimation for Crowdsourcing Data Analysis

摘要

著录项

相似文献

相关主题

期刊订阅