【24h】

Large scale similarity-based relation expansion

机译:基于大规模相似度的关系扩展

获取原文

摘要

Recent advances in automatic knowledge acquisition methods make it possible to construct massive knowledge bases of semantic relations, containing information potentially unknown to their users. However for certain data mining tasks like finding potential causes of a disease or side-effects of a drug, where missing a small piece of information can have grave consequences, the coverage of automatically acquired knowledge bases is often insufficient. This paper explores the use of automatic hypothesis generation for expanding a knowledge base of semantic relations, using distributional word similarities obtained from a large Web corpus. If successful, such a method can drastically improve the coverage of automatically acquired semantic relations, at the expense of a slight reduction in accuracy. We show that large scale similarity-based relation expansion works quite well for this purpose. Using a 100 million Japanese Web page corpus as input, we could generate a substantial amount of new semantic relations that were not found in the input corpus but whose validity was confirmed in a much larger Web corpus, i.e., by using a commercial Web search engine.
机译:自动知识获取方法的最新进展使构建大量语义关系知识库成为可能,其中包含用户可能不知道的信息。但是,对于某些数据挖掘任务(例如查找疾病的潜在原因或药物的副作用),如果丢失一小部分信息可能会造成严重后果,则自动获取的知识库的覆盖范围通常不足。本文探讨了使用自动假设生成来扩展语义关系的知识库的方法,该方法利用了从大型Web语料库获得的分布词相似性。如果成功的话,这种方法可以大大提高自动获取的语义关系的覆盖范围,但会以略微降低准确性为代价。我们证明,基于大规模相似度的关系扩展可以很好地实现此目的。使用1亿个日语网页语料库作为输入,我们可以生成大量新的语义关系,这些语义关系在输入语料库中找不到,但其有效性已在更大的Web语料库中得到确认,即通过使用商业Web搜索引擎。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号