首页> 外文期刊>Expert Systems with Application >A topic modeling based approach to novel document automatic summarization
【24h】

A topic modeling based approach to novel document automatic summarization

机译:基于主题建模的新颖文档自动摘要方法

获取原文
获取原文并翻译 | 示例
       

摘要

Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality. (C) 2017 Elsevier Ltd. All rights reserved.
机译:现有的大多数文本自动摘要算法大多针对长度较短的多文档,因此难以立即应用于结构自由且长度较长的新颖文档。在本文中,针对新颖的文献,我们提出了一种基于主题建模的抽取式自动摘要方法,以实现压缩率,摘要质量和机器可读性之间的良好平衡。首先,基于主题建模,我们从预处理的新颖文档中提取与主题词关联的候选句子。其次,以压缩比和主题多样性为目标,我们设计了一种重要性评估功能,从候选句子中选择最重要的句子,从而生成一个新颖的摘要。最后,我们对初始摘要进行平滑处理,以克服歧义词或同义词造成的语义混乱,从而提高摘要的可读性。我们在真实的新颖数据集上实验性地评估了我们提出的方法。实验结果表明,与其他候选算法相比,我们的方法生成的每个自动摘要不仅压缩率更高,而且摘要质量更好。 (C)2017 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号