...
首页> 外文期刊>Journal of Information Science >Similarity versus relatedness: A novel approach in extractive Persian document summarisation
【24h】

Similarity versus relatedness: A novel approach in extractive Persian document summarisation

机译:相似性与相关性:波斯文摘录摘要中的一种新颖方法

获取原文
获取原文并翻译 | 示例
           

摘要

Automatic text summarisation is the process of creating a summary from one or more documents by eliminating the details and preserving the worthwhile information. This article presents a single/multi-document summariser using a novel clustering method for creating summaries. First, a feature selection phase is employed. Then, FarsNet, the Persian WordNet, is utilised to extract the semantic information of words. Therefore, the input sentences are categorised into three main clusters: similarity, relatedness and coherency. Each similarity cluster contains similar sentences to its core, while each relatedness cluster contains sentences that are related (but not similar) to its core. The coherency clusters show the sentences that should be kept together to preserve the coherency of the summary. Finally, the centroid of each similarity cluster having the most feature score is added to an empty summary. The summary is enlarged by including related sentences from relatedness clusters and excluding similar sentences to its content iteratively. Coherency clusters are applied to the created summary in the last step. The proposed method has been compared with three known existing text summarisation systems and techniques for the Persian language: FarsiSum, Parsumist and Ijaz. Our proposed method leads to improvement in experimental results on different measurements including precision, recall, F -measure, ROUGE-N and ROUGE-L.
机译:自动文本摘要是通过消除细节并保留有价值的信息从一个或多个文档创建摘要的过程。本文介绍了使用新颖的聚类方法创建摘要的单/多文档摘要器。首先,采用特征选择阶段。然后,使用波斯语言网FarsNet提取单词的语义信息。因此,将输入句子分为三个主要类别:相似性,相关性和连贯性。每个相似性聚类包含与其核心相似的句子,而每个相似性聚类包含与其核心相关(但不相似)的句子。连贯性群集显示应保留在一起以保持摘要的连贯性的句子。最后,将具有最高特征分数的每个相似性聚类的质心添加到一个空的摘要中。通过从相关性群集中包含相关语句并迭代地将相似语句排除到其内容中,可以扩大摘要的范围。在最后一步中,将一致性集群应用于创建的摘要。该提议的方法已与波斯语的三种已知的现有文本摘要系统和技术进行了比较:FarsiSum,Parsumist和Ijaz。我们提出的方法可以改善不同测量结果的实验​​结果,包括精度,召回率,F量测,ROUGE-N和ROUGE-L。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号