首页> 外文会议>Australasian Joint Conference on Artificial Intelligence >Improving Sentence Similarity Measurement by Incorporating Sentential Word Importance
【24h】

Improving Sentence Similarity Measurement by Incorporating Sentential Word Importance

机译:通过结合句子词重要性来提高句子相似度测量

获取原文

摘要

Measuring similarity between sentences plays an important role in textual applications such as document summarization and question answering. While various sentence similarity measures have recently been proposed, these measures typically only take into account word importance by virtue of inverse document frequency (IDF) weighting. IDF values are based on global information compiled over a large corpus of documents, and we hypothesise that at the sentence level better performance can be achieved by using a measure of the importance of a word within the sentence that it appears. In this paper we show how the PageRank graph-centrality algorithm can be used to assign a numerical measure of importance to each word in a sentence, and how these values can be incorporated within various sentence similarity measures. Results from applying the measures to a difficult sentence clustering task demonstrates that incorporation of sentential word importance leads to statistically significant improvement in clustering performance as evaluated using a range of external clustering criteria.
机译:测量句子之间的相似性在文本应用中起重要作用,例如文件摘要和问题应答。虽然最近提出了各种句子相似度措施,但这些措施通常仅考虑凭借逆文档频率(IDF)加权来描述重要性。 IDF值基于由大型文档编译的全局信息,我们假设在句子水平上,通过使用它出现的句子中的句子中的单词的重要性来实现更好的性能。在本文中,我们展示了PageRank图表中心算法如何用于为句子中的每个单词分配数字测量值,以及如何在各种句子相似度量中结合这些值。将措施应用于困难的句子聚类任务的结果表明,并入句子词重要性导致使用一系列外部聚类标准评估的聚类性能的统计上显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号