Tibetan Word Segmentation as Sub-syllable Tagging with Syllable's Part-of-Speech Property

机译：具有音节词性的藏语分词作为子音节标记

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

When Tibetan word segmentation task is taken as a sequence labelling problem, machine learning models such as ME and CRFs can be used to train the segmenter. The performance of the segmenter is related to many factors. In the paper, three factors, namely strategy on abbreviated syllables, tag set, and the syllable's Part-Of-Speech property, are compared. Experiment data show that: first, if each abbreviate syllable is separated into two units for labelling rather than one, the F-measure improves 0.06% and 0.10% on 4-tag set and 6-tag set respectively. Second, if 6-tag set is used rather than 4-tag set, the F-measure improves 0.10 % and 0.14 % on the two strategies on abbreviated syllables respectively. Third, when the syllable's Part-Of-Speech property is take into account, F-measure improves 0.47% and 0.41% respectively than the other two methods without using it on 4-tag set, while it improves 0.45 % and 0.35 % on 6-tag set, which is much more higher than the former improvements. So it's a better choice to take advantage of the syllable's Part-Of-Speech property information while using the sub-syllable as the tag unit.

机译：当将藏语切词任务作为序列标签问题时，可以使用机器学习模型（例如ME和CRF）来训练切词器。分段器的性能与许多因素有关。在本文中，比较了三个因素，即缩写音节的策略，标签集和音节的词性特性。实验数据表明：首先，如果将每个缩写音节分成两个单元进行标记而不是一个，则F量度分别对4标记集和6标记集提高了0.06％和0.10％。其次，如果使用6标记集而不是4标记集，则在缩写音节的两种策略上，F量度分别提高了0.10％和0.14％。第三，考虑到音节的词性特性，F测度比其他两种方法在不使用4标记集的情况下分别提高了0.47％和0.41％，而在6标记集上则提高了0.45％和0.35％。 -tag set，比以前的改进要高得多。因此，在将子音节用作标签单元的同时，利用音节的词性属性信息是一个更好的选择。

著录项

来源
《China national conference on computational linguistics;International symposium on natural language processing based on naturally annotated big data》|2015年|189-201|共13页
会议地点
作者
Huidan Liu; Congjun Long; Minghua Nuo; Jian Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Tibetan word segmentation; Tibetan; Sub-syllable tagging; CRFs; Syllable's POS property;

机译：藏语分词;藏;子音节标记; CRF;音节的POS属性;

相似文献

外文文献
中文文献
专利

1. Syllable-Pattern-Based Unknown-Morpheme Segmentation and Estimation for Hybrid Part-of-Speech Tagging of Korean [J] . Gary Geunbae Lee, Jeongwon Cha, Jong-Hyeok Lee Computational linguistics . 2002,第1期

机译：基于音节模式的未知语素分割和韩语混合词性标注的估计
2. Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech Tagging [J] . Shivachi Casper Shikali, Mokhosi Refuoe, Zhou Shijie, ACM transactions on Asian and low-resource language information processing . 2021,第4期

机译：使用Conv-LSTM模型进行斯瓦希里语字表示和词语标记的学习音节
3. A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS Tagging [J] . Mao Cunli, Man Zhibo, Yu Zhengtao, ACM transactions on Asian and low-resource language information processing . 2021,第4期

机译：具有伯尔马斯音节分割，词分割和POS标记的伯特的神经关节模型
4. Tibetan Word Segmentation as Sub-syllable Tagging with Syllable's Part-of-Speech Property [C] . Huidan Liu, Congjun Long, Minghua Nuo, China National Conference on Computational Linguistics . 2015

机译：藏语单词分段为子音节标记，使用音节的词性属性
5. The benefits of syllable segmentation and word reading practice for adolescents with reading and spelling difficulties. [D] . Bhattacharya, Alpana. 2001

机译：在阅读和拼写困难的青少年中，音节分割和单词阅读练习的好处。
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库

Tibetan Word Segmentation as Sub-syllable Tagging with Syllable's Part-of-Speech Property

摘要

著录项

相似文献

相关主题

期刊订阅