首页> 外文会议>International conference on audio, language and image processing;ICALIP 2010 >Laplacian Eigenmaps for Automatic News Story Segmentation
【24h】

Laplacian Eigenmaps for Automatic News Story Segmentation

机译:Laplacian特征图用于自动新闻报导分段

获取原文

摘要

This paper presents a novel lexical-similarity-based approach to automatic story segmentation in broadcast news. When measuring the connection between a pair of sentences, we take two factors into consideration, i.e. the lexical similarity and the distance between them in the text stream. Further investigation of pairwise connections between sentences is based on the technique of Laplacian Eigenmaps (LE). Talcing advantage of the LE algorithm, we construct a Euclidean space in which each sentence is mapped to a vector. The original connective strength between sentences is reflected by the Euclidean distances between the corresponding vectors in the target space of the map. Further analysis of the map leads to a straightforward criterion for optimal segmentation. Then we formalize story segmentation as a minimization problem and give a dynamic programming solution to it. Experimental results on the TDT2 corpus show that the proposed method outperforms several state-of-the-art lexical-similarity-based methods.
机译:本文提出了一种新颖的基于词法相似度的广播新闻中自动故事分割方法。在测量一对句子之间的联系时,我们考虑了两个因素,即词法相似度和文本流中它们之间的距离。句子之间成对连接的进一步研究是基于拉普拉斯特征图谱(LE)的技术。利用LE算法的优势,我们构建了一个欧几里得空间,其中每个句子都映射到一个向量。句子之间的原始连接强度由地图目标空间中相应向量之间的欧几里得距离反映。对地图的进一步分析导致了最佳分割的直接标准。然后,我们将故事分割形式化为最小化问题,并为其提供动态编程解决方案。在TDT2语料库上的实验结果表明,该方法优于几种基于词法相似性的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号