首页> 外文会议>Computational Linguistics and Intelligent Text Processing; Lecture Notes in Computer Science; 4394 >Part-of-Speech Tagging Using Word Probability Based on Category Patterns
【24h】

Part-of-Speech Tagging Using Word Probability Based on Category Patterns

机译:基于类别模式的词概率词性标注

获取原文
获取原文并翻译 | 示例

摘要

This paper focuses on part-of-speech (POS, category) tagging based on word probability estimated using morpheme unigrams and category patterns within a word. The word-N-gram-based POS-tagging model is difficult to adapt to agglutinative languages such as Korean, Turkish and Hungarian, among others, due to the high productivity of words. Thus, many of the stochastic studies on Korean POS-tagging have been conducted based on morpheme N-grams. However, the morpheme-N-gram model also has difficulty coping with data sparseness when augmenting contextual information in order to assure sufficient performance. In addition, the model has difficulty conceiving the relationship of morphemes within a word. The present POS-tagging algorithm (a) resolves the data-sparseness problem thanks to a morpheme-unigram-based approach and (b) involves the relationship of morphemes within a word by estimating the weight of the category of a morpheme in a category pattern constituting a word. With the proposed model, a performance similar to that with other models that use more than just the morpheme-unigram model was observed.
机译:本文重点研究基于词态的词性(POS,类别)标记,该词性基于使用词素单字组和词中的类别模式估计的词概率。由于单词的高生产率,基于单词-N-gram的POS标记模型很难适应诸如朝鲜语,土耳其语和匈牙利语等凝集性语言。因此,已经基于词素N-gram对韩国语POS标签进行了许多随机研究。然而,当增强上下文信息以确保足够的性能时,语素-N-gram模型也难以应付数据稀疏性。另外,该模型难以理解单词内词素的关系。当前的POS标记算法(a)通过基于词素-单字组的方法解决了数据稀疏问题,并且(b)通过估计类别模式中词素的类别权重来涉及单词内词素的关系。构成一个词。在提出的模型中,观察到的性能与使用更多词素-单字母组模型的其他模型相似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号