首页> 外文期刊>MIS quarterly >Expecting the Unexpected: Effects of Data Collection Design Choices on the Quality of Crowdsourced User-Generated Content
【24h】

Expecting the Unexpected: Effects of Data Collection Design Choices on the Quality of Crowdsourced User-Generated Content

机译:出乎意料:数据收集设计选择对众包用户生成内容质量的影响

获取原文
获取原文并翻译 | 示例
       

摘要

As crowdsourced user-generated content becomes an important source of data for organizations, a pressing question is how to ensure that data contributed by ordinary people outside of traditional organizational boundaries is of suitable quality to be useful for both known and unanticipated purposes. This research examines the impact of different information quality management strategies, and corresponding data collection design choices, on key dimensions of information quality in crowdsourced user-generated content. We conceptualize a contributor-centric information quality management approach focusing on instance-based data collection. We contrast it with the traditional consumer-centric fitness-for-use conceptualization of information quality that emphasizes class-based data collection. We present laboratory and field experiments conducted in a citizen science domain that demonstrate trade-offs between the quality dimensions of accuracy, completeness (including discoveries), and precision between the two information management approaches and their corresponding data collection designs. Specifically, we show that instance-based data collection results in higher accuracy, dataset completeness, and number of discoveries, but this comes at the expense of lower precision. We further validate the practical value of the instance-based approach by conducting an applicability check with potential data consumers (scientists, in our context of citizen science). In a follow-up study, we show, using human experts and supervised machine learning techniques, that substantial precision gains on instance-based data can be achieved with post-processing. We conclude by discussing the benefits and limitations of different information quality and data collection design choices for information quality in crowdsourced user-generated content.
机译:随着众包用户生成的内容成为组织的重要数据来源,一个紧迫的问题是如何确保由传统组织边界之外的普通百姓贡献的数据具有合适的质量,可用于已知和意外目的。这项研究研究了众包用户生成内容中不同信息质量管理策略以及相应数据收集设计选择对信息质量关键维度的影响。我们概念化以贡献者为中心的信息质量管理方法,重点是基于实例的数据收集。我们将其与传统的以消费者为中心的信息使用适应性概念化(强调基于类的数​​据收集)进行对比。我们介绍了在公民科学领域中进行的实验室和现场实验,这些实验证明了两种信息管理方法及其相应的数据收集设计之间的准确性,完整性(包括发现)和精度的质量范围之间的取舍。具体来说,我们表明基于实例的数据收集可提高准确性,数据集完整性和发现数量,但这是以降低精度为代价的。通过与潜在的数据使用者(科学家,在我们的公民科学背景下)进行适用性检查,我们进一步验证了基于实例的方法的实用价值。在后续研究中,我们表明,使用人工专家和受监督的机器学习技术,可以通过后处理实现基于实例的数据的显着精度提升。最后,我们讨论了众包用户生成内容中不同信息质量和数据收集设计选择对信息质量的好处和局限性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号