首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >Broadcast News Story Segmentation Using Probabilistic Latent Semantic Analysis and Laplacian Eigenmaps
【24h】

Broadcast News Story Segmentation Using Probabilistic Latent Semantic Analysis and Laplacian Eigenmaps

机译:使用概率潜在语义分析和Laplacian eigenmaps广播新闻故事分割

获取原文

摘要

This paper proposes to integrate probabilistic latent semantic analysis (PLSA) and Laplacian Eigenmaps (LE) for broadcast news story segmentation. PLSA can address synonymy and polysemy problems by exploring underlying semantic relations beneath the actual occurrences of words. LE can provide a data transformation with the advantage of preserving the original temporal structure of sentence cohesive relations.We adopt PLSA statistics to replace term frequency as the representation of sentences and measure their connective strength. LE analysis is then performed on the connective strength matrix so that the sentence relations becomes geometrically evident for discriminating different stories. A dynamic programming (DP) algorithm is used for story boundary identification. Experiments show that the proposed method achieves superior story segmentation performances with the highest F1-measure of 0:7536 on TDT2 Mandarin BN corpus.
机译:本文建议为广播新闻故事分割集成概率潜在语义分析(PLSA)和Laplacian Eigenmaps(La)。 PLSA可以通过探索实际出现的单词下面的潜在语义关系来解决同义词和波动问题。 Le可以提供数据转换,其中包括保留句子关系的原始时间结构的优势。我们采用PLSA统计数据将术语频率替换为句子的表示,并测量它们的连接力量。然后对结缔组强矩阵进行LE分析,以便句子关系变得几何上,以辨别不同的故事。动态编程(DP)算法用于故事边界识别。实验表明,该方法达到了卓越的故事分段性能,最高的F1-措施0:7536上的TDT2普通话Corpus。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号