首页> 外文会议>International Conference on Acoustics, Speech and Signal Processing >MEASURING SEMANTIC SIMILARITY BY CONTEXTUAL WORD CONNECTIONS IN CHINESE NEWS STORY SEGMENTATION
【24h】

MEASURING SEMANTIC SIMILARITY BY CONTEXTUAL WORD CONNECTIONS IN CHINESE NEWS STORY SEGMENTATION

机译:中文新闻故事分段中的语境词连接测量语义相似性

获取原文

摘要

A lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-truth topic labeling or story boundaries. We show that contextual word connections can help to produce semantically meaningful similarity measurement between any pair of Chinese words. Based on this, we further use a parallel all-pair SimRank algorithm to propagate such contextual similarities throughout the whole vocabulary. The resultant word semantic similarity matrix is then used to refine the classical cosine similarity measurement of sentences. Experiments on benchmark Chinese news corpora show that, story segmentation using the proposed soft semantic similarity measurement can always produce better segmentation accuracy than using the hard similarity. Specifically, we can achieve 3%-10% average Fl-measure improvement to state-of-the-art NCuts based story segmentation.
机译:最近在故事细分中的许多工作都侧重于开发更好的分区标准,以将新闻记录转换为局部相干故事的序列,同时简单地依赖于基于重复的硬字级相似性并忽略不同词之间的语义相关性。在本文中,我们提出了一种纯粹的数据驱动方法来测量来自给定语料库的软语义词和句子级相似度,而没有语言知识,地面真理主题标签或故事边界的指导。我们表明上下文中的连接可以帮助在任何一对中文单词之间产生语义有意义的相似度测量。基于此,我们进一步使用并行全对SIMRANK算法在整个词汇中传播这些上下文相似性。然后使用所得到的词语语义相似性矩阵来优化句子的经典余弦相似度测量。基准中文新闻学习的实验表明,使用所提出的软语义相似度测量的故事分割总是可以产生比使用硬相似性更好的分割精度。具体而言,我们可以达到3%-10%的平均流动措施改进,以最先进的基于NCUTS的故事分割。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号