首页> 外文会议>Conference on advanced information systems engineering >CrowdCorrect: A Curation Pipeline for Social Data Cleansing and Curation
【24h】

CrowdCorrect: A Curation Pipeline for Social Data Cleansing and Curation

机译:CrowdCorrect:社交数据清理和整理的整理管道

获取原文

摘要

Process and data are equally important for business process management. Data-driven approaches in process analytics aims to value decisions that can be backed up with verifiable private and open data. Over the last few years, data-driven analysis of how knowledge workers and customers interact in social contexts, often with data obtained from social networking services such as Twitter and Facebook, have become a vital asset for organizations. For example, governments started to extract knowledge and derive insights from vastly growing open data to improve their services. A key challenge in analyzing social data is to understand the raw data generated by social actors and prepare it for analytic tasks. In this context, it is important to transform the raw data into a contex-tualized data and knowledge. This task, known as data curation, involves identifying relevant data sources, extracting data and knowledge, cleansing, maintaining, merging, enriching and linking data and knowledge. In this paper we present CrowdCorrect, a data curation pipeline to enable analysts cleansing and curating social data and preparing it for reliable business data analytics. The first step offers automatic feature extraction, correction and enrichment. Next, we design micro-tasks and use the knowledge of the crowd to identify and correct information items that could not be corrected in the first step. Finally, we offer a domain-model mediated method to use the knowledge of domain experts to identify and correct items that could not be corrected in previous steps. We adopt a typical scenario for analyzing Urban Social Issues from Twitter as it relates to the Government Budget, to highlight how CrowdCorrect significantly improves the quality of extracted knowledge compared to the classical curation pipeline and in the absence of knowledge of the crowd and domain experts.
机译:流程和数据对于业务流程管理同样重要。流程分析中的数据驱动方法旨在评估可以通过可验证的私有和开放数据备份的决策。在过去的几年中,对知识工作者和客户如何在社交环境中进行交互的数据驱动分析(通常与从Twitter和Facebook等社交网络服务获得的数据)已成为组织的重要资产。例如,政府开始从大量增长的开放数据中提取知识并获得见解,以改善其服务。分析社交数据的关键挑战是了解社交参与者生成的原始数据,并为分析任务做准备。在这种情况下,重要的是将原始数据转换为可转换为文本的数据和知识。这项任务称为数据管理,涉及识别相关数据源,提取数据和知识,清理,维护,合并,丰富和链接数据和知识。在本文中,我们介绍了CrowdCorrect,这是一种数据管理管道,可让分析人员清理和管理社交数据,并为可靠的业务数据分析做准备。第一步是自动提取,校正和充实特征。接下来,我们设计微任务,并利用人群的知识来识别和纠正第一步中无法纠正的信息项。最后,我们提供了一种域模型介导的方法,以利用领域专家的知识来识别和纠正在先前步骤中无法纠正的项目。我们采用一种典型的方案来分析Twitter中与政府预算有关的城市社会问题,以强调与传统的策展渠道相比,CrowdCorrect如何显着提高提取的知识的质量,并且没有人群和领域专家的知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号