首页> 外文OA文献 >Using microtasks to crowdsource DBpedia entity classification: A study in workflow design
【2h】

Using microtasks to crowdsource DBpedia entity classification: A study in workflow design

机译:使用微任务来众包DBpedia实体分类:工作流设计研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

DBpedia is at the core of the Linked Open Data Cloud and widely used in research and applications. However, it is far from being perfect. Its content suffers from many flaws, as a result of factual errors inherited from Wikipedia or incomplete mappings from Wikipedia infobox to DBpedia ontology. In this work we focus on one class of such problems, un-typed entities. We propose a hierarchical tree-based approach to categorize DBpedia entities according to the DBpedia ontology using human computation and paid microtasks. We analyse the main dimensions of the crowdsourcing exercise in depth in order to come up with suggestions for workflow design and study three different workflows with automatic and hybrid prediction mechanisms to select possible candidates for the most specific category from the DBpedia ontology. To test our approach, we run experiments on CrowdFlower using a gold standard dataset of 120 previously unclassified entities. In our studies human-computation driven approaches generally achieved higher precision at lower cost when compared to workflows with automatic predictors. However, each of the tested workflows has its merit and none of them seems to perform exceptionally well on the entities that the DBpedia Extraction Framework fails to classify. We discuss these findings and their potential implications for the design of effective crowdsourced entity classification in DBpedia and beyond.
机译:DBpedia是链接开放数据云的核心,并广泛用于研究和应用程序中。但是,它远非完美。由于从Wikipedia继承的事实错误或从Wikipedia信息框到DBpedia本体的不完整映射,其内容遭受了许多缺陷。在这项工作中,我们专注于这类问题的一类,即未分类的实体。我们提出了一种基于树的分层方法,根据DBpedia本体使用人工计算和付费微任务对DBpedia实体进行分类。我们深入分析了众包活动的主要方面,以便为工作流程设计提出建议,并研究具有自动和混合预测机制的三种不同的工作流程,以便从DBpedia本体中为最具体的类别选择可能的候选人。为了测试我们的方法,我们使用包含120个先前未分类实体的黄金标准数据集在CrowdFlower上进行了实验。在我们的研究中,与具有自动预测器的工作流相比,人为计算驱动的方法通常以较低的成本实现了更高的精度。但是,每个经过测试的工作流程都有其优点,并且似乎没有一个在DBpedia Extraction Framework无法分类的实体上表现异常出色。我们讨论了这些发现及其对DBpedia及更高版本中有效的众包实体分类设计的潜在影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号