Self-validated Story Segmentation of Chinese Broadcast News

机译：自我验证的中国广播新闻故事分段

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic story segmentation is an important prerequisite for semantic-level applications. The normalized cuts (NCuts) method has recently shown great promise for segmenting English spoken lectures. However, the availability assumption of the exact story number per file significantly limits its capability to handle a large number of transcripts. Besides, how to apply such method to Chinese language in the presence of speech recognition errors is unclear yet. Addressesing these two problems, we propose a self-validated NCuts (SNCuts) algorithm for segmenting Chinese broadcast news via inaccurate lexical cues, generated by the Chinese large vocabulary continuous speech recognizer (LVCSR). Due to the specialty of Chinese language, we present a subword-level graph embedding for the erroneous LVCSR transcripts. We regularize the NCuts criterion by a general exponential prior of story numbers, respecting the principle of Occam's razor. Given the maximum story number as a general parameter, we can automatically obtain reasonable segmentations for a large number of news transcripts, with the story numbers automatically determined for each file, and with comparable complexity to alternative non-self-validated methods. Extensive experiments on benchmark corpus show that: (ⅰ) the proposed SNCuts algorithm can efficiently produce comparable or even better segmentation quality, as compared to other state-of-the-art methods with true story number as an input parameter; and (ⅱ) the subword-level embedding always helps to recovering lexical cohesion in Chinese erroneous transcripts, thus improving both segmentation accuracy and robustness to LVCSR errors.

机译：自动故事分割是语义级应用程序的重要先决条件。最近，归一化剪切（NCuts）方法在分割英语口语课程方面显示出了巨大的希望。但是，每个文件的确切故事编号的可用性假设大大限制了其处理大量成绩单的能力。此外，尚不清楚如何在存在语音识别错误的情况下将这种方法应用于中文。针对这两个问题，我们提出了一种自我验证的NCuts（SNCuts）算法，用于通过中文大词汇量连续语音识别器（LVCSR）生成的不正确的词汇提示对中文广播新闻进行细分。由于中文的特殊性，我们为错误的LVCSR成绩单提供了一个子词级图嵌入。我们遵循故事编号的一般指数先验对NCuts准则进行规范化，同时尊重Occam剃刀的原理。给定最大故事编号作为一般参数，我们可以自动获取大量新闻记录的合理片段，并自动为每个文件确定故事编号，并且其复杂性与其他非自我验证方法相当。在基准语料库上的大量实验表明：（ⅰ）与其他以真实故事编号作为输入参数的最新方法相比，所提出的SNCuts算法可以有效地产生可比甚至更好的分割质量; （ⅱ）子词级嵌入始终有助于恢复中文错误笔录中的词汇衔接，从而提高了切分准确性和对LVCSR错误的鲁棒性。

著录项

来源
《International conference on brain-inspired cognitive systems》|2018年|568-578|共11页
会议地点
作者
Wei Feng; Lei Xie; Jin Zhang; Yujun Zhang; Yanning Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Story segmentation; Self-validation; Topic detection; Chinese broadcast news; Subwords; Normalized cuts;

机译：故事细分;自我验证;主题检测;中文广播新闻;子词;标准化切割;

相似文献

外文文献
中文文献
专利

1. Story co-segmentation of Chinese broadcast news using weakly-supervised semantic similarity [J] . Feng Wei, Nie Xuecheng, Zhang Yujun, Neurocomputing . 2019,第AUGa25期

机译：使用弱监督的语义相似度对中文广播新闻进行故事共分
2. Story co-segmentation of Chinese broadcast news using weakly-supervised semantic similarity [J] . Feng Wei, Nie Xuecheng, Zhang Yujun, Neurocomputing . 2019,第Auga25期

机译：使用虚弱的语义相似性的中国广播新闻的故事共同分割
3. On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news [J] . Xie L., Yang Y.-L., Liu Z.-Q. Information Sciences: An International Journal . 2011,第13期

机译：基于词衔接的汉语广播新闻故事分段中子词的有效性
4. Unsupervised broadcast news story segmentation using distance dependent Chinese restaurant processes [C] . Yang Chao, Xie Lei, Zhou Xiangzeng IEEE International Conference on Acoustics, Speech and Signal Processing . 2014

机译：使用距离相关的中餐厅流程进行无监督的广播新闻报道分段
5. Story tracking in video news broadcasts. [D] . Miadowicz, Jedrzej Zdzislaw. 2004

机译：视频新闻广播中的故事跟踪。
6. Negativity Bias in Media Multitasking: The Effects of Negative Social Media Messages on Attention to Television News Broadcasts [O] . Jari Kätsyri, Teemu Kinnunen, Kenta Kusumoto, -1

机译：媒体多任务处理中的消极偏见：社交媒体消极信息对电视新闻广播关注的影响
7. UNSUPERVISED BROADCAST NEWS STORY SEGMENTATION USING DISTANCE DEPENDENT CHINESE RESTAURANT PROCESSES [O] . Chao Yang, Lei Xie, Xiangzeng Zhou 2015

机译：使用远程相关的中国餐馆过程进行无监管的广播新闻故事分割

Self-validated Story Segmentation of Chinese Broadcast News

摘要

著录项

相似文献

相关主题

期刊订阅