首页> 外文期刊>Signal processing >Learning distributed sentence representations for story segmentation
【24h】

Learning distributed sentence representations for story segmentation

机译:学习分布式句子表示以进行故事分割

获取原文
获取原文并翻译 | 示例

摘要

Traditional sentence representations such as bag-of-words (BOW) and term frequency-inverse document frequency (tf-idf) face the problem of data sparsity and may not generalize well. Neural network based representations such as word/sentence vectors are usually trained in an unsupervised way and lack the topic information which is important for story segmentation. In this paper, we propose to learn sentence representation by using deep neural network (DNN) to directly predict the topic class of the input sentence. By using supervised training, the learned vector representation of sentences contains more topic information and is more suitable for the story segmentation task. The input of the DNN is BOW vector computed from a context window. Multiple time resolution BOW and bottleneck features (BNF) are also introduced to enhance the performance of story segmentation. As text data labeled with topic information is limited, we cluster stories into classes and use the class ID as the topic label of the stories for DNN training. We evaluated the proposed sentence representation with the TextTiling and normalized cuts (NCuts) based story segmentation methods on the topic detection and tracking (TDT2) task. Experimental results show that the proposed topical sentence representation outperforms both the BOW baseline and the recently proposed neural network based representations, i.e., word and sentence vectors.
机译:传统的句子表示形式,例如词袋(BOW)和词频逆文档频率(tf-idf)面临数据稀疏性的问题,可能无法很好地概括。基于神经网络的表示(例如单词/句子向量)通常以无监督的方式进行训练,并且缺少对于故事分段非常重要的主题信息。在本文中,我们建议使用深度神经网络(DNN)学习句子表示,以直接预测输入句子的主题类别。通过使用监督训练,学习到的句子矢量表示包含更多的主题信息,并且更适合于故事分割任务。 DNN的输入是从上下文窗口计算的BOW向量。还引入了多个时间分辨率BOW和瓶颈功能(BNF)以增强故事分割的性能。由于标有主题信息的文本数据有限,因此我们将故事分为几类,并使用课程ID作为DNN培训的故事的主题标签。我们使用基于主题检测和跟踪(TDT2)任务的TextTiling和归一化剪切(NCuts)的故事分割方法评估了建议的句子表示形式。实验结果表明,提出的主题句表示优于BOW基线和最近提出的基于神经网络的表示,即单词和句子向量。

著录项

  • 来源
    《Signal processing》 |2018年第1期|403-411|共9页
  • 作者单位

    Shaanxi Provincial Key Laboratory of Speech and Image Information Processing, School of Computer Science, Northwestern Polytechnical University, Xi'an, China,School of Computer and Information Engineering, Luoyang Institute of Science and Technology, Luoyang, China;

    Shaanxi Provincial Key Laboratory of Speech and Image Information Processing, School of Computer Science, Northwestern Polytechnical University, Xi'an, China;

    Temasek Laboratories@NTU, Nanyang Technological University, Singapore;

    Temasek Laboratories@NTU, Nanyang Technological University, Singapore;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Deep neural network; Distributed representation; Sentence vector; Topical sentence representation; Word vector;

    机译:深度神经网络分布式表示;句子向量;主题句表示;词向量;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号