首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection
【24h】

A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection

机译:带有词嵌入的双向LSTM方法用于句子边界检测

获取原文
获取原文并翻译 | 示例

摘要

Recovering sentence boundaries from speech and its transcripts is essential for readability and downstream speech and language processing tasks. In this paper, we propose to use deep recurrent neural network to detect sentence boundaries in broadcast news by modeling rich prosodic and lexical features extracted at each inter-word position. We introduce an unsupervised word embedding to represent word identity, learned from the Continuous Bag-of-Words (CBOW) model, into sentence boundary detection task as an effective feature. The word embedding contains syntactic information that is essential for this detection task. In addition, we propose another two low-dimensional word embeddings derived from a neural network that includes class and context information to represent words by supervised learning: one is extracted from the projection layer, the other one comes from the last hidden layer. Furthermore, we propose a deep bidirectional Long Short Term Memory (LSTM) based architecture with Viterbi decoding for sentence boundary detection. Under this framework, the long-range dependencies of prosodic and lexical information in temporal sequences are modeled effectively. Compared with previous state-of-the-art DNN-CRF method, the proposed LSTM approach reduces 24.8% and 9.8% relative NIST SU error in reference and recognition transcripts, respectively.
机译:从语音及其抄本中恢复句子边界对于可读性以及下游语音和语言处理任务至关重要。在本文中,我们建议通过对每个单词间位置提取的丰富韵律和词汇特征进行建模,使用深度递归神经网络来检测广播新闻中的句子边界。我们从连续词袋(CBOW)模型中学到了一种无监督词嵌入来表示词的身份,将其作为有效特征引入到句子边界检测任务中。词嵌入包含此检测任务必不可少的语法信息。此外,我们提出了另外两种从神经网络中提取的低维词嵌入方法,其中包括类和上下文信息以通过监督学习来表示词:一种是从投影层中提取的,另一种是从最后一个隐藏层中提取的。此外,我们提出了一种基于深度双向长期短期记忆(LSTM)的架构,具有维特比解码功能,可用于句子边界检测。在此框架下,有效地建模了时间序列中韵律和词汇信息的长期依赖性。与以前的最新DNN-CRF方法相比,拟议的LSTM方法在参考和识别成绩单中分别减少了24.8%和9.8%的相对NIST SU相对误差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号