首页> 外文期刊>Journal of Emerging Technologies in Web Intelligence >A Novel Method of Significant Words Identification in Text Summarization
【24h】

A Novel Method of Significant Words Identification in Text Summarization

机译:文本摘要中重要词识别的新方法

获取原文
       

摘要

—Text summarization is a process that reduces the size of the text document and extracts significant sentences from a text document. We present a novel technique for text summarization. The originality of technique lies on exploiting local and global properties of words and identifying significant words. The local property of word can be considered as the sum of normalized term frequency multiplied by its weight and normalized number of sentences containing that word multiplied by its weight. If local score of a word is less than local score threshold, we remove that word. Global property can be thought of as maximum semantic similarity between a word and title words. Also we introduce an iterative algorithm to identify significant words. This algorithm converges to the fixed number of significant words after some iterations and the number of iterations strongly depends on the text document. We used a two-layered backpropagation neural network with three neurons in the hidden layer to calculate weights. The results show that this technique has better performance than MS-word 2007, baseline and Gistsumm summarizers.
机译:-文本摘要是减少文本文档大小并从文本文档中提取重要句子的过程。我们提出了一种新颖的文本摘要技术。技术的独创性在于利用单词的局部和全局特性并识别重要的单词。单词的局部属性可以看作是归一化词频乘以其权重与包含该词的归一化句子数乘以其权重的总和。如果一个单词的本地分数小于本地分数阈值,我们将删除该单词。可以将全局属性视为单词和标题单词之间最大的语义相似性。我们还介绍了一种迭代算法来识别重要单词。在某些迭代之后,该算法收敛到固定数量的有效词,并且迭代次数很大程度上取决于文本文档。我们使用两层反向传播神经网络,在隐藏层中包含三个神经元来计算权重。结果表明,该技术比MS-word 2007,基线和Gistsumm汇总器具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号