A Study on Diacritic Restoration Problem in Vietnamese Text using Deep Learning based Models

机译：基于深度学习的越南语文本变音恢复问题研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Diacritic restoration is a challenging problem in natural language processing (NLP). With diacritic restoration, one can text faster and easier. Diacritic restoration is also helpful in making use of diacritic-missing texts, which are normally discarded in many NLP applications. This paper deals with the diacritic restoration problem for Vietnamese text. Three state- of-the-art deep learning models including Gated Recurrent Unit, Bidirectional Long-short Term Memory and Bidirectional Gated Recurrent Unit have been examined for the problem and the last one turned out to be the best among them. Besides deep learning models, it was found in this paper that word tokenization, which is the final pre-processing step applied on the data before feeding it to deep learning models also have influences on the final accuracy. Between two examined word tokenization methods: morpheme-based tokenization and phrase-based tokenization, the former yield better results regardless of the applied deep learning models. The experimental results show that the combination of morpheme-based tokenization and Bidirectional-GRU achieve the best performance of diacritic restoration with the Bleu-score of 88.06%.

机译：变音恢复是自然语言处理中一个具有挑战性的问题。通过变音恢复，人们可以更快、更容易地发送文本。变音恢复还有助于利用变音缺失的文本，在许多NLP应用程序中，这些文本通常被丢弃。本文研究越南语文本的变音恢复问题。三种最先进的深度学习模型，包括门控重复单元、双向长短时记忆和双向门控重复单元，都被用来解决这个问题，最后一种模型被证明是最好的。除了深度学习模型之外，本文还发现单词标记化（在将数据输入深度学习模型之前对数据进行最后的预处理）也会影响最终的准确性。在两种被研究的单词标记化方法：基于语素的标记化和基于短语的标记化之间，无论应用何种深度学习模型，前者都能产生更好的结果。实验结果表明，基于语素的标记化和双向GRU相结合的变音恢复效果最好，Bleu分数为88.06%。

著录项

来源
《IEEE International Conference on Communication, Networks and Satellite》|2021年|306-310|共5页
会议地点
作者
Quang-Linh Tran; Gia-Huy Lam; Van-Binh Duong; Trong-Hop Do;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep learning; Training; Satellites; Computational modeling; Logic gates; Predictive models; Prediction algorithms;

机译：深度学习;训练卫星;计算建模;逻辑门;预测模型;预测算法;

相似文献

外文文献
中文文献
专利

1. A Survey of Text Summarization Approaches Based on Deep Learning [J] . Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, 计算机科学技术学报（英文版） . 2021,第003期
2. Investigation on the Chinese Text Sentiment Analysis Based on Convolutional Neural Networks in Deep Learning [J] . Feng Xu, Xuefen Zhang, Zhanhong Xin, 计算机、材料和连续体(英文) . 2019,第003期
3. Flash flood susceptibility mapping using a novel deep learning model based on deep belief network, back propagation and genetic algorithm [J] . Himan Shahabi, Ataollah Shirzadi, Somayeh Ronoud, 地学前缘(英文版) . 2021,第003期
4. Micro-mechanical damage diagnosis methodologies based on machine learning and deep learning models [J] . Shahab SHAMSIRBAND, Nabi MEHRI KHANSARI 浙江大学学报（英文版）（A辑：应用物理和工程） . 2021,第008期
5. Character-Based Machine Learning vs. Language Modeling for Diacritics Restoration [J] . Kapo?iūt?-Dzikien? Jurgita, Davidsonas Andrius, Vidugirien? Au?ra Engineering Economics . 2017,第4期

机译：基于字符的机器学习与语言模型的变音符号还原
6. Improving Named Entity Recognition in Vietnamese Texts by a Character-Level Deep Lifelong Learning Model [J] . Ngoc-Vu Nguyen, Thi-Lan Nguyen, Cam-Van Nguyen Thi, Vietnam Journal of Computer Science . 2019,第4期

机译：通过角色级深终身学习模型改善越南文本中的命名实体识别
7. Arabic Diacritic Restoration Approach Based On Maximum Entropy Models [J] . Imed Zitouni, Ruhi Sarikaya Computer speech and language . 2009,第3期

机译：基于最大熵模型的阿拉伯变音符号还原方法
8. Vietnamese Diacritics Restoration Using Deep Learning Approach [C] . Bui Thanh Hung International Conference on Knowledge and Systems Engineering . 2018

机译：使用深度学习方法恢复越南变音符号
9. Deep Neural Network Based Iterative Self-Taught Learning on Text Mining [D] . Liu, Xiangwen. 2019

机译：基于神经网络的文本挖掘的迭代自学学习
10. The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset [O] . Duc Chung Tran 2020

机译：基于越南FOSD-Tacotron-2的文本到语音模型数据集
11. On the Use of Machine Translation-Based Approaches for Vietnamese Diacritic Restoration [O] . Pham, Thai-Hoang, Pham, Xuan-Khoai, Le-Hong, Phuong 2017

机译：论越南语机器翻译方法的运用变音修复
12. Installation Restoration Program Stage 3. Remedial Investigation/Feasibility Study, Elmendorf Air Force Base, Alaska. Volume 2. Section 5 - Bibliography Text. [R] . 1990

机译：安装恢复计划阶段3.补救调查/可行性研究，阿拉斯加埃尔门多夫空军基地。第2卷第5节 - 参考书目文本。

A Study on Diacritic Restoration Problem in Vietnamese Text using Deep Learning based Models

摘要

著录项

相似文献

相关主题

期刊订阅