【24h】

Encoding Distributional Semantics into Triple-Based Knowledge Ranking for Document Enrichment

机译:将分布语义编码为基于三重知识的知识排名,以丰富文档

获取原文

摘要

Document enrichment focuses on retrieving relevant knowledge from external resources, which is essential because text is generally replete with gaps. Since conventional work primarily relies on special resources, we instead use triples of Subject, Predicate, Object as knowledge and incorporate distributional semantics to rank them. Our model first extracts these triples automatically from raw text and converts them into real-valued vectors based on the word semantics captured by Latent Dirich-let Allocation. We then represent these triples, together with the source document that is to be enriched, as a graph of triples, and adopt a global iterative algorithm to propagate relevance weight from source document to these triples so as to select the most relevant ones. Evaluated as a ranking problem, our model significantly outperforms multiple strong baselines. Moreover, we conduct a task-based evaluation by incorporating these triples as additional features into document classification and enhances the performance by 3.02%.
机译:丰富的文档集中于从外部资源中获取相关知识,这是必不可少的,因为文本通常充满空白。由于常规工作主要依赖于特殊资源,因此我们改用主语,谓语,宾语三元组作为知识,并结合分布语义对它们进行排名。我们的模型首先从原始文本中自动提取这些三元组,然后根据Latent Dirich-let Allocation捕获的单词语义将它们转换为实值向量。然后,我们将这些三元组与要丰富的源文档一起表示为三元组图,并采用全局迭代算法将相关权重从源文档传播到这些三元组,以便选择最相关的三元组。作为排名问题进行​​评估,我们的模型明显优于多个强大的基准。此外,我们通过将这些三元组作为附加功能纳入文档分类来进行基于任务的评估,并将性能提高3.02%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号