首页> 外文会议>IEEE International Conference on Data Engineering >Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms
【24h】

Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms

机译:朝完成具有新兴查询条款的域特定知识库

获取原文

摘要

Domain-specific knowledge bases play an increasingly important role in a variety of real applications. In this paper, we use the product knowledge base in the largest Chinese e-commerce platform, Taobao, as an example to investigate a completion procedure of a domain-specific knowledge base. We argue that the domain-specific knowledge bases tend to be incomplete, and are oblivious to their incompleteness, without a continuous completion procedure in place. The key component of this completion procedure is the classification of emerging query terms into corresponding properties of categories in existing taxonomy. Our proposal is that we use query logs to complete the product knowledge base of Taobao. However, the query driven completion usually faces many challenges including distinguishing the fine-grained semantic of unrecognized terms, handling the sparse data and so on. We propose a graph based solution to overcome these challenges. We first construct a lot of positive evidence to establish the semantical similarity between terms, and then run a shortest path or alternatively a random walk on the similarity graph under a set of constraints derived from a set of negative evidence to find the best candidate property for emerging query terms. We finally conduct extensive experiments on real data of Taobao and a subset of CN-DBpedia. The results show that our solution classifies emerging query terms with a good performance. Our solution is already deployed in Taobao, helping it find nearly 7 million new values for properties. The complete product knowledge base significantly improves the ratio of recognized queries and recognized terms by more than 25% and 32%, respectively.
机译:域特定知识库在各种真实应用中起着越来越重要的作用。在本文中,我们使用中国最大的电子商务平台淘宝的产品知识库作为调查域特定知识库的完成过程的示例。我们认为,具体领域的知识库往往是不完整的,并且不完全不完整,没有连续完成程序。此完成程序的关键组成部分是在现有分类中的类别的相应属性中进行新出现的查询条款的分类。我们的提议是我们使用查询日志来完成淘宝的产品知识库。然而,查询驱动的完成通常面临许多挑战,包括区分微粒语义的无法识别的术语,处理稀疏数据等。我们提出了一种基于图形的解决方案来克服这些挑战。我们首先构建许多积极的证据来建立术语之间的语义相似性,然后在一组否定证据的一组约束下运行最短路径或在相似性图中运行随机步行,以找到最佳候选物业新兴查询条款。我们终于对淘宝的真实数据和CN-DBPedia的子集进行了广泛的实验。结果表明,我们的解决方案通过良好的性能对新兴查询术语进行分类。我们的解决方案已部署在淘宝中,帮助它找到近700万的物业价值。完整的产品知识库显着提高了公认的查询与公认的术语,分别超过25%和32%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号