首页> 外文会议>International University Communication Symposium >Large scale similarity-based relation expansion
【24h】

Large scale similarity-based relation expansion

机译:基于大规模的相似关系扩展

获取原文

摘要

Recent advances in automatic knowledge acquisition methods make it possible to construct massive knowledge bases of semantic relations, containing information potentially unknown to their users. However for certain data mining tasks like finding potential causes of a disease or side-effects of a drug, where missing a small piece of information can have grave consequences, the coverage of automatically acquired knowledge bases is often insufficient. This paper explores the use of automatic hypothesis generation for expanding a knowledge base of semantic relations, using distributional word similarities obtained from a large Web corpus. If successful, such a method can drastically improve the coverage of automatically acquired semantic relations, at the expense of a slight reduction in accuracy. We show that large scale similarity-based relation expansion works quite well for this purpose. Using a 100 million Japanese Web page corpus as input, we could generate a substantial amount of new semantic relations that were not found in the input corpus but whose validity was confirmed in a much larger Web corpus, i.e., by using a commercial Web search engine.
机译:最近的自动知识获取方法的进步使得可以构建大规模知识库的语义关系,其中包含其用户可能未知的信息。然而,对于像发现疾病或药物,其中缺一小块的信息可以有严重后果的副作用的潜在原因,某些数据挖掘任务,自动获得的知识基础的覆盖面往往是不够的。本文探讨了使用自动假设生成来扩展语义关系的知识库,使用从大型Web语料库获得的分布词相似之处。如果成功,这种方法可以大大提高自动获得的语义关系的覆盖范围,以少于准确性降低。我们表明,基于大规模的相似性的关系扩展非常适用于此目的。使用100万日语网页语料库作为输入,我们可以生成大量的新语义关系,这些语义未在输入语料库中找到,但其有效性在更大的Web语料库中确认,即,通过使用商业网络搜索引擎。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号