An Investigation of Word Embeddings with Deep Bidirectional LSTM for Sentence Unit Detection in Automatic Speech Transcription

机译：基于深度双向LSTM的词嵌入在自动语音转录中用于句子单元检测的研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work investigates the effectiveness of using the word based and sub-word based embedding representations as input for a deep bidirectional Long Short-Term Memory Network for Sentence Unit Detection in Automatic Speech Recognition transcription. Our experimental results show that using sub-word based embedding can significantly improve the SUD performance when a limited text is used to train both the word embedding and the SUD model. The SUD model using the sub-word based embedding gains up to 2.07% absolute improvement in F1-score as compared to the best model trained with the word-based embedding. When tested on a domain-mismatch condition, the SUD model with sub-word based embedding trained from the in-domain data gives an approximate 2 % and 1 % improvement over the best model using out-of-domain embedding with reference and ASR transcription with 29.5% Word Error Rate respectively.

机译：这项工作调查了使用基于单词和基于子单词的嵌入表示作为自动语音识别转录中用于句子单元检测的深度双向长短期存储网络的输入的有效性。我们的实验结果表明，当使用有限的文本训练单词嵌入和SUD模型时，使用基于子词的嵌入可以显着提高SUD性能。与使用基于单词的嵌入训练的最佳模型相比，使用基于子单词的嵌入的SUD模型在F1分数上的绝对改进高达2.07％。在域不匹配条件下进行测试时，根据域内数据训练的具有基于子词的嵌入的SUD模型与使用参考和ASR转录的域外嵌入的最佳模型相比，具有约2％和1％的改进分别具有29.5％的字错误率。

著录项

来源
《International conference on Asian language processing》|2018年|139-142|共4页
会议地点
作者
Thi-Nga Ho; Duy-Cat Can; EngSiong Chng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data models; Training; Task analysis; Vocabulary; Bidirectional control; Error analysis; Neural networks;

机译：数据模型;培训;任务分析;词汇;双向控制;错误分析;神经网络;

相似文献

外文文献
中文文献
专利

1. A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection [J] . Chenglin Xu, Lei Xie, Xiong Xiao Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：带有词嵌入的双向LSTM方法用于句子边界检测
2. Automatic proficiency assessment of Korean speech read aloud by non‐natives using bidirectional LSTM‐based speech recognition [J] . Yoo Rhee Oh, Kiyoung Park, Hyung‐Bae Jeon, ETRI journal . 2020,第5期

机译：使用基于双向LSTM的语音识别，非洲主义韩国语音的自动能力评估大声朗读
3. Sentence-Embedding and Similarity via Hybrid Bidirectional-LSTM and CNN Utilizing Weighted-Pooling Attention [J] . Degen HUANG, Anil AHMED, Syed Yasser ARAFAT, IEICE transactions on information and systems . 2020,第10期

机译：通过混合双向-LSTM和CNN利用加权汇集注意力的句子嵌入和相似性
4. An Investigation of Word Embeddings with Deep Bidirectional LSTM for Sentence Unit Detection in Automatic Speech Transcription [C] . Thi-Nga Ho, Duy-Cat Can, EngSiong Chng International Conference on Asian Language Processing . 2018

机译：深度双向LSTM对句子单位检测的单词嵌入词的调查
5. Parallel Sentence Detection in Comparable Corpora with Bilingual Word Embeddings for Low-Resource Languages [D] . Cadigan, John. 2018

机译：与低资源语言的双语单词嵌入式的同类语料中的并行句子检测
6. LSTMCNNsucc: A Bidirectional LSTM and CNN-Based Deep Learning Method for Predicting Lysine Succinylation Sites [O] . Guohua Huang, Qingfeng Shen, Guiyang Zhang, 2021

机译：LSTMCNNSUCC：一种预测赖氨酸琥珀酸位点的双向LSTM和基于CNN的深度学习方法
7. Deep Learning Bidirectional LSTM based Detection of Prolongation and Repetition in Stuttered Speech using Weighted MFCC [O] . Sakshi Gupta, Ravi S., Rajesh K., 2020

机译：基于深度学习的双向LSTM检测用加权MFCC的延长和重复检测
8. Modeling words with subword units in an articulatorily constrained speech recognition algorithm [R] . Hogden, J. 1997

机译：在语音约束语音识别算法中用子词单元建模单词

An Investigation of Word Embeddings with Deep Bidirectional LSTM for Sentence Unit Detection in Automatic Speech Transcription

摘要

著录项

相似文献

相关主题

期刊订阅