Tibetan Word Segmentation Method Based on BiLSTM_ CRF Model

机译：基于BiLSTM_ CRF模型的藏文分词方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Tibetan word segmentation is one of the key technologies to realize Tibetan speech synthesis and Tibetan speech recognition. Traditional Tibetan word segmentations mainly relied on the combination of rules and statistics. The model automatic learning features become possible in the era of deep learning. This paper proposes a Tibetan word segmentation method based on bidirectional long-term memory neural network with conditional random field model (BiLSTM_ CRF). The Tibetan sentence is firstly divided into clauses, words and abbreviated words manually. Low-frequency words are removed to form a Tibetan dictionary. The text features are then extracted with the dictionary by embedding words into the corpus using Word2vec to get word vectors. The word vector features are transmited to the BiLSTM model. The learned result from BiLSTM model is finally transmitted as features to the CRF model for four-word labeling to obtain the Tibetan word segmentation results. The experimental results show that the proposed Tibetan word segmentation method can achieve better word segmentation effect. The accuracy of word segmentation can reach 94.33%, the recall rate is 93.89% and the F value is 94.11%.

机译：藏语分词是实现藏语语音合成和藏语语音识别的关键技术之一。传统的藏语分词主要依靠规则和统计的结合。在深度学习时代，模型自动学习功能成为可能。提出了一种基于双向长期记忆神经网络和条件随机场模型（BiLSTM_CRF）的藏语分词方法。首先将藏语句子手动分为从句，单词和缩写单词。去除低频词，形成藏文字典。然后通过使用Word2vec将单词嵌入到语料库中以获取单词向量，从而通过字典提取文本特征。词向量特征被传输到BiLSTM模型。来自BiLSTM模型的学习结果最终作为特征传输到CRF模型以进行四词标记，以获得藏文词分割结果。实验结果表明，提出的藏文分词方法可以达到较好的分词效果。分词的准确率可以达到94.33％，召回率为93.89％，F值为94.11％。

著录项

来源
《International conference on Asian language processing》|2018年|297-302|共6页
会议地点
作者
Lili Wang; Hongwu Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Neural networks; Logic gates; Training; Context modeling; Dictionaries; Tagging;

机译：隐马尔可夫模型;神经网络;逻辑门;训练;上下文建模;字典;标记;

相似文献

外文文献
中文文献
专利

1. Learning Chinese Word Segmentation Based on Bidirectional GRU-CRF and CNN Network Model [J] . Chenghai Yu, Shupei Wang, Jiajun Guo International journal of technology and human interaction . 2019,第3期

机译：基于双向GRU-CRF和CNN网络模型的中文分词学习
2. Chinese Word Segmentation based on Bidirectional GRU-CRF Model [J] . Jinli Che, Liwei Tang, Shijie Deng, International Journal of Performability Engineering . 2018,第12期

机译：基于双向GRU-CRF模型的中文词分割
3. Word Segmentation for Burmese Based on Dual-Layer CRFs [J] . Zhang Shaoning, Mao Cunli, Yu Zhengtao, ACM transactions on Asian language information processing . 2019,第1期

机译：基于双层CRF的缅甸语分词
4. Tibetan Word Segmentation Method Based on BiLSTM_ CRF Model [C] . Lili Wang, Hongwu Yang International Conference on Asian Language Processing . 2018

机译：基于Bilstm_CRF模型的藏语词分割方法
5. Image Segmentation Methods Based on Tight-frame and Mumford-Shah Model. [D] . Cai, Xiaohao. 2012

机译：基于紧帧和Mumford-Shah模型的图像分割方法。
6. A hierarchical method based on active shape models and directed Hough transform for segmentation of noisy biomedical images; application in segmentation of pelvic X-ray images [O] . Rebecca Smith, Kayvan Najarian, Kevin Ward 2009

机译：一种基于主动形状模型和定向霍夫变换的分层方法用于分割嘈杂的生物医学图像；在骨盆X线图像分割中的应用
7. Ontology-Based Semantic Image Segmentation Using Mixture Models and Multiple CRFs [O] . Mohsen Zand, Shyamala Doraisamy, Alfian Abdul Halin, 2016

机译：基于本体的语义图像分割，使用混合模型和多个CRF

Tibetan Word Segmentation Method Based on BiLSTM_ CRF Model

摘要

著录项

相似文献

相关主题

期刊订阅