首页> 外文会议>International conference on Asian language processing >An Investigation of Word Embeddings with Deep Bidirectional LSTM for Sentence Unit Detection in Automatic Speech Transcription
【24h】

An Investigation of Word Embeddings with Deep Bidirectional LSTM for Sentence Unit Detection in Automatic Speech Transcription

机译:基于深度双向LSTM的词嵌入在自动语音转录中用于句子单元检测的研究

获取原文

摘要

This work investigates the effectiveness of using the word based and sub-word based embedding representations as input for a deep bidirectional Long Short-Term Memory Network for Sentence Unit Detection in Automatic Speech Recognition transcription. Our experimental results show that using sub-word based embedding can significantly improve the SUD performance when a limited text is used to train both the word embedding and the SUD model. The SUD model using the sub-word based embedding gains up to 2.07% absolute improvement in F1-score as compared to the best model trained with the word-based embedding. When tested on a domain-mismatch condition, the SUD model with sub-word based embedding trained from the in-domain data gives an approximate 2 % and 1 % improvement over the best model using out-of-domain embedding with reference and ASR transcription with 29.5% Word Error Rate respectively.
机译:这项工作调查了使用基于单词和基于子单词的嵌入表示作为自动语音识别转录中用于句子单元检测的深度双向长短期存储网络的输入的有效性。我们的实验结果表明,当使用有限的文本训练单词嵌入和SUD模型时,使用基于子词的嵌入可以显着提高SUD性能。与使用基于单词的嵌入训练的最佳模型相比,使用基于子单词的嵌入的SUD模型在F1分数上的绝对改进高达2.07%。在域不匹配条件下进行测试时,根据域内数据训练的具有基于子词的嵌入的SUD模型与使用参考和ASR转录的域外嵌入的最佳模型相比,具有约2%和1%的改进分别具有29.5%的字错误率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号