首页> 外文会议>International Conference on Computer and Information Technology >Automatic Bengali news documents summarization by introducing sentence frequency and clustering
【24h】

Automatic Bengali news documents summarization by introducing sentence frequency and clustering

机译:通过引入句子频率和聚类来自动进行孟加拉语新闻文件摘要

获取原文

摘要

A method has been proposed in this paper for Bengali news documents summarization which extracts significant sentences using the four major steps (a) preprocessing, (b) sentence ranking, (c) sentence clustering, and (d) summary generation. The noticeable feature of this method is the incorporation of the sentence frequency where redundancy elimination is a consequence. Another one remarkable aspect is sentence clustering on the basis of similarity ratio among sentences. The summary sentence selection is done from all the clusters so that there will be maximum coverage of information in summary even if information is found scattered in input document. Two sets of human generated summary have been utilized where one is to train the system and another is for performance evaluation. The proposed method has been found better while turning comparison with the latest state-of-the art method of Bengali news documents summarization. The results of performance evaluation show that the average Precision, Recall and F-measure values are 0.608, 0.664 and 0.632 respectively.
机译:本文提出了一种孟加拉语新闻文档摘要的方法,该方法使用以下四个主要步骤来提取重要句子:(a)预处理,(b)句子排名,(c)句子聚类和(d)摘要生成。此方法的显着特征是合并了句子频率,因此会导致冗余消除。另一个值得注意的方面是基于句子之间的相似率的句子聚类。从所有群集中进行摘要语句选择,这样即使在输入文档中发现分散的信息,摘要中的信息也将得到最大覆盖。已经使用了两组人工生成的摘要,其中一组用于训练系统,另一组用于性能评估。与孟加拉语新闻文档摘要的最新技术相比,该方法被发现更好。性能评估结果表明,平均Precision,Recall和F-measure值分别为0.608、0.664和0.632。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号