Part-of-Speech Tagging Using Word Probability Based on Category Patterns

机译：基于类别模式的词概率词性标注

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper focuses on part-of-speech (POS, category) tagging based on word probability estimated using morpheme unigrams and category patterns within a word. The word-N-gram-based POS-tagging model is difficult to adapt to agglutinative languages such as Korean, Turkish and Hungarian, among others, due to the high productivity of words. Thus, many of the stochastic studies on Korean POS-tagging have been conducted based on morpheme N-grams. However, the morpheme-N-gram model also has difficulty coping with data sparseness when augmenting contextual information in order to assure sufficient performance. In addition, the model has difficulty conceiving the relationship of morphemes within a word. The present POS-tagging algorithm (a) resolves the data-sparseness problem thanks to a morpheme-unigram-based approach and (b) involves the relationship of morphemes within a word by estimating the weight of the category of a morpheme in a category pattern constituting a word. With the proposed model, a performance similar to that with other models that use more than just the morpheme-unigram model was observed.

机译：本文重点研究基于词态的词性（POS，类别）标记，该词性基于使用词素单字组和词中的类别模式估计的词概率。由于单词的高生产率，基于单词-N-gram的POS标记模型很难适应诸如朝鲜语，土耳其语和匈牙利语等凝集性语言。因此，已经基于词素N-gram对韩国语POS标签进行了许多随机研究。然而，当增强上下文信息以确保足够的性能时，语素-N-gram模型也难以应付数据稀疏性。另外，该模型难以理解单词内词素的关系。当前的POS标记算法（a）通过基于词素-单字组的方法解决了数据稀疏问题，并且（b）通过估计类别模式中词素的类别权重来涉及单词内词素的关系。构成一个词。在提出的模型中，观察到的性能与使用更多词素-单字母组模型的其他模型相似。

著录项

来源
《Computational Linguistics and Intelligent Text Processing; Lecture Notes in Computer Science; 4394》|2007年|119-130|共12页
会议地点 Mexico City(MX)
作者
Mi-young Kang; Sung-won Jung; Kyung-soon Park; Hyuk-chul Kwon;
展开▼
作者单位

Pusan National University, Korean Language Processing Laboratory, Department of Computer Science Engineering,Jangjeon-dong, Geumjeong-gu, 609-735, Busan, Korea;

Pusan National University, Center for U-Port IT Research and Education Jangjeon-dong, Geumjeong;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. A TENGRAM method based part-of-speech tagging of multi-category words in Hindi language [J] . J.P. Gupta, Devendra K. Tayal, Arti Gupta Expert Systems with Application . 2011,第12期

机译：基于TENGRAM方法的印地语多类别词的词性标注
2. A New Part-of-Speech Tagging System Based on Clsed-Words, Word Form and Rules [J] . WU Yan, LI Xiukun, WANG Kaizhu Journal of Harbin Institute of Technology . 1999,第1期

机译：一种基于分类词，词形和规则的词性标注系统
3. A New Part-of-Speech Tagging System Based on Closed-words, Word Form and Rules [J] . 吴岩, 李修昆, 王开铸哈尔滨工业大学学报：英文版 . 1999,第001期

机译：基于闭合词，单词形式和规则的新术语标记系统
4. Part-of-Speech Tagging Using Word Probability Based on Category Patterns [C] . Mi-young Kang, Sung-won Jung, Kyung-soon Park, International Conference on Computational Linguistics and Intelligent Text Processing . 2007

机译：使用基于类别模式的词概率的词性标记
5. Voltage stability analysis based on probability theory. [D] . Zhang, Jianfen. 2010

机译：基于概率论的电压稳定性分析。
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. Generating a Category Set of Words Using a Hierarchical Part-of-Speech System and Tagged Corpus [O] . 小島丈幸, 小谷善行 2002

机译：使用分级词性系统和标记语料库生成词的类别集

Part-of-Speech Tagging Using Word Probability Based on Category Patterns

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅