首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Measuring semantic similarity by contextualword connections in Chinese news story segmentation
【24h】

Measuring semantic similarity by contextualword connections in Chinese news story segmentation

机译:中国新闻故事分段中的上下文词联系度量语义相似度

获取原文

摘要

A lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-truth topic labeling or story boundaries. We show that contextual word connections can help to produce semantically meaningful similarity measurement between any pair of Chinese words. Based on this, we further use a parallel all-pair SimRank algorithm to propagate such contextual similarities throughout the whole vocabulary. The resultant word semantic similarity matrix is then used to refine the classical cosine similarity measurement of sentences. Experiments on benchmark Chinese news corpora show that, story segmentation using the proposed soft semantic similarity measurement can always produce better segmentation accuracy than using the hard similarity. Specifically, we can achieve 3%–10% average F1-measure improvement to state-of-the-art NCuts based story segmentation.
机译:故事分割中的许多最新工作着眼于开发更好的划分标准,以将新闻记录分割成局部连贯的故事序列,同时仅依赖于基于重复的硬词级相似性,而忽略了不同词之间的语义相关性。在本文中,我们提出了一种纯粹的数据驱动方法,用于从给定语料库中测量软语义单词和句子级别的相似性,而无需语言知识,真实主题标签或故事边界的指导。我们表明,上下文单词连接可以帮助在任何一对中国单词之间产生语义上有意义的相似性度量。基于此,我们进一步使用并行全对SimRank算法在整个词汇表中传播此类上下文相似性。然后,将所得的词语义相似度矩阵用于完善句子的经典余弦相似度度量。在基准中文新闻语料库上进行的实验表明,与使用硬相似度相比,使用建议的软语义相似度度量进行故事分割始终可以产生更好的分割精度。具体来说,对于基于最新NCuts的故事细分,我们可以将F1度量平均提高3%–10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号