首页> 外文期刊>MIS quarterly >Expecting the Unexpected: Effects of Data Collection Design Choices on the Quality of Crowdsourced User-Generated Content
【24h】

Expecting the Unexpected: Effects of Data Collection Design Choices on the Quality of Crowdsourced User-Generated Content

机译:期待意外:数据收集设计选择对众包用户生成内容的质量影响

获取原文
获取原文并翻译 | 示例
       

摘要

As crowdsourced user-generated content becomes an important source of data for organizations, a pressing question is how to ensure that data contributed by ordinary people outside of traditional organizational boundaries is of suitable quality to be useful for both known and unanticipated purposes. This research examines the impact of different information quality management strategies, and corresponding data collection design choices, on key dimensions of information quality in crowdsourced user-generated content. We conceptualize a contributor-centric information quality management approach focusing on instance-based data collection. We contrast it with the traditional consumer-centric fitness-for-use conceptualization of information quality that emphasizes class-based data collection. We present laboratory and field experiments conducted in a citizen science domain that demonstrate trade-offs between the quality dimensions of accuracy, completeness (including discoveries), and precision between the two information management approaches and their corresponding data collection designs. Specifically, we show that instance-based data collection results in higher accuracy, dataset completeness, and number of discoveries, but this comes at the expense of lower precision. We further validate the practical value of the instance-based approach by conducting an applicability check with potential data consumers (scientists, in our context of citizen science). In a follow-up study, we show, using human experts and supervised machine learning techniques, that substantial precision gains on instance-based data can be achieved with post-processing. We conclude by discussing the benefits and limitations of different information quality and data collection design choices for information quality in crowdsourced user-generated content.
机译:随着众群用户生成的内容成为组织的重要数据来源,迫切问题是如何确保在传统的组织边界之外的普通人所贡献的数据是适合知名和意外目的的适用性。本研究介绍了不同信息质量管理策略的影响,以及相应的数据收集设计选择,以众包用户生成内容中信息质量的关键维度。我们概念化一个以贡献为中心的信息质量管理方法,专注于基于实例的数据收集。我们将其与传统的消费者的健身概念化对比信息质量的概念化,以强调基于类的数​​据收集。我们在公民科学领域进行了实验室和现场实验,在精度,完整性(包括发现)的质量方面,以及两种信息管理方法和相应数据收集设计之间的精确度,展示权衡的权衡。具体而言,我们显示基于实例的数据收集导致更高的准确性,数据集完整性和发现数量,但这是以较低的精度为代价。我们通过使用潜在数据消费者(科学家,在我们的公民科学的背景下)来进一步验证基于实例的方法的实际价值。在随访的研究中,我们展示了使用人力专家和监督机器学习技术,可以通过后处理实现基于实例的数据的实质精度增益。我们通过讨论不同信息质量和数据收集设计选择的利益和限制,以便在众群用户生成的内容中提供信息质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号