首页> 外文会议>International Conference on Technologies and Applications of Artificial Intelligence >Automatic Punctuation Restoration for corpus in Traditional Chinese Language using Deep Learning
【24h】

Automatic Punctuation Restoration for corpus in Traditional Chinese Language using Deep Learning

机译:使用深度学习中汉语语料库的自动标点恢复

获取原文
获取外文期刊封面目录资料

摘要

The Automatic Speech Recognition (ASR) technique has already been applied to several chat apps, allowing people to orally input messages instead of typing words by hand. Meanwhile, ASR techniques have also been used in the transcription of meeting minutes from audio records. However, there exist two main reasons such that ASR systems are not suitable for some formal situations: wrong words caused by erroneous recognition and lacking punctuation marks, which degrade the readability and might express wrong meaning. In our work, we expect to set up a model to automatically restore punctuation marks for the corpus generated by ASR systems; however, since lacking such labeled data for our ASR corpus, we train and test our model totally on the corresponding transcript data. This research focuses on automatic punctuation restoration for traditional Chinese language corpus using neural network model. Our results show that the bidirectional Gated Recurrent Unit (GRU) with attention mechanism outperforms other models on our punctuation restoration task when the amount of the training data is limited.
机译:自动语音识别(ASR)技术已经应用于几个聊天应用程序,允许人们口头输入消息而不是手动键入单词。同时,ASR技术也已在从音频记录的会议记录转录中使用。然而,存在两种主要原因,使ASR系统不适合某些正式情况:由错误识别和缺乏标点符号引起的错误词,这降低了可读性,并且可能表达错误的意义。在我们的工作中,我们预计会设置模型,以便自动恢复由ASR系统生成的语料库的标点符号;但是,由于缺少我们ASR语料库的此类标记数据,我们在相应的签字数据上培训和测试我们的模型。本研究侧重于使用神经网络模型的传统汉语语料库自动标点恢复。我们的结果表明,当训练数据的数量有限时,双向门控复发单元(GRU)与注意机制有关我们的标点符恢复任务的其他模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号