首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Enhanced Language Modeling with Proximity and Sentence Relatedness Information for Extractive Broadcast News Summarization
【24h】

Enhanced Language Modeling with Proximity and Sentence Relatedness Information for Extractive Broadcast News Summarization

机译:增强语言建模与邻近和句子相关信息的提取广播新闻摘要

获取原文
获取原文并翻译 | 示例
       

摘要

The primary task of extractive summarization is to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important theme of the original document. Recently, language modeling (LM) has been proven to be a promising modeling framework for performing this task in an unsupervised manner. However, there still remain three fundamental challenges facing the existing LM-based methods, which we set out to tackle in this article. The first one is how to construct a more accurate sentence model in this framework without resorting to external sources of information. The second is how to take into account sentence-level structural relationships, in addition to word-level information within a document, for important sentence selection. The last one is how to exploit the proximity cues inherent in sentences to obtain a more accurate estimation of respective sentence models. Specifically, for the first and second challenges, we explore a novel, principled approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to render sentence-level structural relationships within the document, leading to better summarization effectiveness. For the third challenge, we investigate several formulations of proximity cues for use in sentence modeling involved in the LM-based summarization framework, free of the strict bag-of-words assumption. Furthermore, we also present various ensemble methods that seamlessly integrate proximity and sentence relatedness information into sentence modeling. Extensive experiments conducted on a Mandarin broadcast news summarization task show that such integration of proximity and sentence relatedness information is indeed beneficial for speech summarization. Our proposed summarization methods can significantly boost the performance of an LM-based strong baseline (e.g., with a maximum ROUGE-2 improvement of 26.7% relative) and also outperform several state-of-the-art unsupervised methods compared in the article.
机译:提取摘要的主要任务是自动从文本或口头文本中选择一组代表性句子,可以简明地表达原始文档最重要的主题。最近,已被证明是语言建模(LM)是以无人监督的方式执行此任务的有希望的建模框架。然而,仍然存在三种基于LM的方法的基本挑战,我们在本文中阐述了解决。第一个是如何在本框架中构建一个更准确的句子模型,而不诉诸外部信息来源。第二个是如何考虑句子级结构关系,除了文档中的字级信息之外,对于重要的句子选择。最后一个是如何利用句子中固有的邻近提示来获得相应句型模型的更准确的估计。具体而言,对于第一和第二挑战,我们探索一种新颖的,原则性的方法,该方法生成重叠的集群,以从要概述的文档中提取句子相关信息,这不仅可以增强各种句型模型的估计,而且可以使用文档中的句子级结构关系,导致更好的总结效果。对于第三次挑战,我们调查了在基于LM的摘要框架中涉及的句子建模的若干接近线索的配方,无论是严格的单词假设。此外,我们还呈现各种集合方法,可以将接近度和句子相关信息无缝地集成到句子建模中。在普通话广播新闻摘要任务上进行的广泛实验表明,此类邻近和句子相关信息的整合确实有利于语音摘要。我们所提出的摘要方法可以显着提高基于LM的强基线的性能(例如,相对的最大胭脂-2相对的胭脂2),并且在物品中比较了几种最先进的无人监督方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号