首页> 外文会议>IEEE International Conference on Data Engineering >Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms
【24h】

Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms

机译:借助新兴查询词完成领域特定的知识库

获取原文

摘要

Domain-specific knowledge bases play an increasingly important role in a variety of real applications. In this paper, we use the product knowledge base in the largest Chinese e-commerce platform, Taobao, as an example to investigate a completion procedure of a domain-specific knowledge base. We argue that the domain-specific knowledge bases tend to be incomplete, and are oblivious to their incompleteness, without a continuous completion procedure in place. The key component of this completion procedure is the classification of emerging query terms into corresponding properties of categories in existing taxonomy. Our proposal is that we use query logs to complete the product knowledge base of Taobao. However, the query driven completion usually faces many challenges including distinguishing the fine-grained semantic of unrecognized terms, handling the sparse data and so on. We propose a graph based solution to overcome these challenges. We first construct a lot of positive evidence to establish the semantical similarity between terms, and then run a shortest path or alternatively a random walk on the similarity graph under a set of constraints derived from a set of negative evidence to find the best candidate property for emerging query terms. We finally conduct extensive experiments on real data of Taobao and a subset of CN-DBpedia. The results show that our solution classifies emerging query terms with a good performance. Our solution is already deployed in Taobao, helping it find nearly 7 million new values for properties. The complete product knowledge base significantly improves the ratio of recognized queries and recognized terms by more than 25% and 32%, respectively.
机译:特定领域的知识库在各种实际应用中扮演着越来越重要的角色。本文以最大的中国电子商务平台淘宝网中的产品知识库为例,研究特定领域知识库的完成过程。我们认为,特定领域的知识库往往不完整,并且没有连续完成程序就忽略了它们的不完整。此完成过程的关键部分是将新兴查询词归类为现有分类法中类别的相应属性。我们的建议是使用查询日志来完善淘宝的产品知识库。但是,查询驱动的完成通常面临许多挑战,包括区分无法识别的术语的细粒度语义,处理稀疏数据等。我们提出了一种基于图形的解决方案来克服这些挑战。我们首先构造大量积极证据以建立术语之间的语义相似性,然后在由一系列否定证据派生的一组约束条件下,在相似性图上运行最短路径或随机游走,以找到最佳的候选属性。新出现的查询字词。最后,我们对淘宝和CN-DBpedia的子集的真实数据进行了广泛的实验。结果表明,我们的解决方案对具有良好性能的新兴查询词进行了分类。我们的解决方案已经部署在淘宝上,帮助它找到了将近700万个新的房地产价值。完整的产品知识库可将公认的查询和公认的术语的比例分别显着提高25%和32%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号