首页> 外文期刊>Computational linguistics >A Graph-Theoretic Framework for Semantic Distance
【24h】

A Graph-Theoretic Framework for Semantic Distance

机译:图论语义距离框架

获取原文
       

摘要

Many NLP applications entail that texts are classified based on their semantic distance (how similar or different the texts are). For example, comparing the text of a new document to that of documents of known topics can help identify the topic of the new text. Typically, a distributional distance is used to capture the implicit semantic distance between two pieces of text. However, such approaches do not take into account the semantic relations between words. In this article, we introduce an alternative method of measuring the semantic distance between texts that integrates distributional information and ontological knowledge within a network flow formalism. We first represent each text as a collection of frequency-weighted concepts within an ontology. We then make use of a network flow method which provides an efficient way of explicitly measuring the frequency-weighted ontological distance between the concepts across two texts. We evaluate our method in a variety of NLP tasks, and find that it performs well on two of three tasks. We develop a new measure of semantic coherence that enables us to account for the performance difference across the three data sets, shedding light on the properties of a data set that lends itself well to our method.
机译:许多NLP应用程序要求根据文本的语义距离(文本有多相似或不同)对文本进行分类。例如,将新文档的文本与已知主题的文档的文本进行比较可以帮助识别新文本的主题。通常,分布距离用于捕获两段文本之间的隐式语义距离。但是,这样的方法没有考虑单词之间的语义关系。在本文中,我们介绍了一种测量文本之间语义距离的替代方法,该方法将分布信息和本体知识集成到网络流形式主义中。我们首先将每个文本表示为本体中频率加权概念的集合。然后,我们利用网络流方法,该方法提供了一种有效的方法,可以显式测量跨两个文本的概念之间的频率加权本体论距离。我们在各种NLP任务中评估了我们的方法,发现它在三个任务中的两个上表现良好。我们开发了一种语义一致性的新度量,该度量使我们能够解释三个数据集之间的性能差异,从而阐明一个数据集的属性,这很适合我们的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号