Enhanced Language Modeling with Proximity and Sentence Relatedness Information for Extractive Broadcast News Summarization

SHIH-HUNG LIU; KUAN-YU CHEN; BERLIN CHEN

首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Enhanced Language Modeling with Proximity and Sentence Relatedness Information for Extractive Broadcast News Summarization

【24h】

Enhanced Language Modeling with Proximity and Sentence Relatedness Information for Extractive Broadcast News Summarization

机译：增强语言建模与邻近和句子相关信息的提取广播新闻摘要

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The primary task of extractive summarization is to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important theme of the original document. Recently, language modeling (LM) has been proven to be a promising modeling framework for performing this task in an unsupervised manner. However, there still remain three fundamental challenges facing the existing LM-based methods, which we set out to tackle in this article. The first one is how to construct a more accurate sentence model in this framework without resorting to external sources of information. The second is how to take into account sentence-level structural relationships, in addition to word-level information within a document, for important sentence selection. The last one is how to exploit the proximity cues inherent in sentences to obtain a more accurate estimation of respective sentence models. Specifically, for the first and second challenges, we explore a novel, principled approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to render sentence-level structural relationships within the document, leading to better summarization effectiveness. For the third challenge, we investigate several formulations of proximity cues for use in sentence modeling involved in the LM-based summarization framework, free of the strict bag-of-words assumption. Furthermore, we also present various ensemble methods that seamlessly integrate proximity and sentence relatedness information into sentence modeling. Extensive experiments conducted on a Mandarin broadcast news summarization task show that such integration of proximity and sentence relatedness information is indeed beneficial for speech summarization. Our proposed summarization methods can significantly boost the performance of an LM-based strong baseline (e.g., with a maximum ROUGE-2 improvement of 26.7% relative) and also outperform several state-of-the-art unsupervised methods compared in the article.

机译：提取摘要的主要任务是自动从文本或口头文本中选择一组代表性句子，可以简明地表达原始文档最重要的主题。最近，已被证明是语言建模（LM）是以无人监督的方式执行此任务的有希望的建模框架。然而，仍然存在三种基于LM的方法的基本挑战，我们在本文中阐述了解决。第一个是如何在本框架中构建一个更准确的句子模型，而不诉诸外部信息来源。第二个是如何考虑句子级结构关系，除了文档中的字级信息之外，对于重要的句子选择。最后一个是如何利用句子中固有的邻近提示来获得相应句型模型的更准确的估计。具体而言，对于第一和第二挑战，我们探索一种新颖的，原则性的方法，该方法生成重叠的集群，以从要概述的文档中提取句子相关信息，这不仅可以增强各种句型模型的估计，而且可以使用文档中的句子级结构关系，导致更好的总结效果。对于第三次挑战，我们调查了在基于LM的摘要框架中涉及的句子建模的若干接近线索的配方，无论是严格的单词假设。此外，我们还呈现各种集合方法，可以将接近度和句子相关信息无缝地集成到句子建模中。在普通话广播新闻摘要任务上进行的广泛实验表明，此类邻近和句子相关信息的整合确实有利于语音摘要。我们所提出的摘要方法可以显着提高基于LM的强基线的性能（例如，相对的最大胭脂-2相对的胭脂2），并且在物品中比较了几种最先进的无人监督方法。

著录项

来源
《ACM transactions on Asian and low-resource language information processing》 |2020年第3期|46.1-46.19|共19页
作者
SHIH-HUNG LIU; KUAN-YU CHEN; BERLIN CHEN;
展开▼
作者单位

Delta Management System;

National Taiwan University of Science and Technology;

National Taiwan Normal University ASUS Intelligent Cloud Service Center (AICS);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Extractive summarization; language modeling; overlapped clustering; proximity information;

机译：提取总结;语言建模;重叠的聚类;靠近信息;

相似文献

外文文献
中文文献
专利

1. A Position-Aware Language Modeling Framework for Extractive Broadcast News Speech Summarization [J] . Liu Shih-Hung, Chen Kuan-Yu, Hsieh Yu-Lun, ACM transactions on Asian language information processing . 2017,第4期

机译：提取广播新闻语音摘要的位置感知语言建模框架
2. Grouping sentences as better language unit for extractive text summarization [J] . Mengyun Cao, Hai Zhuge Future generation computer systems . 2020,第Auga期

机译：将句子分组为提取文本摘要的更好语言单位
3. A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization [J] . Yi-Ting Chen, Chen B., Hsin-Min Wang IEEE transactions on audio, speech and language processing . 2009,第1期

机译：提取广播新闻语音摘要的概率生成框架
4. Extracting the prosodic information for Turkish broadcast news data and using on the sentence segmentation task [C] . Dalva Dogan, Revidi Izel D., Guz Umit, Signal Processing and Communications Applications Conference . 2014

机译：提取土耳其广播新闻数据的韵律信息并用于句子分割任务
5. Automatic Broadcast News speech summarization. [D] . Maskey, Sameer Raj. 2008

机译：自动广播新闻语音摘要。
6. Extractive single document summarization using binary differential evolution: Optimization of different sentence quality measures [O] . Naveen Saini, Sriparna Saha, Dhiraj Chakraborty, 2019

机译：采用二元差分演进的提取单一文件摘要：不同句子质量措施的优化
7. Syntaxdriven sentence revision for broadcast news summarization [O] . Hideki Tanaka, Akinori Kinoshita, Takeshi Kobayakawa, 2009

机译：语法驱动的句子修订，用于广播新闻摘要

Enhanced Language Modeling with Proximity and Sentence Relatedness Information for Extractive Broadcast News Summarization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅