首页> 外文期刊>ACM transactions on Asian language information processing >Conditional Random Fields for Korean Morpheme Segmentation and POS Tagging
【24h】

Conditional Random Fields for Korean Morpheme Segmentation and POS Tagging

机译:用于韩国语词素分割和POS标记的条件随机字段

获取原文
获取原文并翻译 | 示例
           

摘要

There has been recent interest in statistical approaches to Korean morphological analysis. However, previous studies have been based mostly on generative models, including a hidden Markov model (HMM), without utilizing discriminative models such as a conditional random field (CRF). We present a two-stage discriminative approach based on CRFs for Korean morphological analysis. Similar to methods used for Chinese, we perform two disambiguation procedures based on CRFs: (1) morpheme segmentation and (2) POS tagging. In morpheme segmentation, an input sentence is segmented into sequences of morphemes, where a morpheme unit is either atomic or compound. In the POS tagging procedure, each morpheme (atomic or compound) is assigned a POS tag. Once POS tagging is complete, we carry out a post-processing of the compound morphemes, where each compound morpheme is further decomposed into atomic morphemes, which is based on pre-analyzed patterns and generalized HMMs obtained from the given tagged corpus. Experimental results show the promise of our proposed method.
机译:最近对韩语形态分析的统计方法产生了兴趣。但是,以前的研究主要基于生成模型,包括隐马尔可夫模型(HMM),而没有利用诸如条件随机场(CRF)之类的判别模型。我们提出了一种基于CRF的两阶段判别方法,用于韩国形态分析。类似于用于中文的方法,我们基于CRF执行两个消歧过程:(1)语素分割和(2)POS标记。在语素分割中,将输入句子分割成语素序列,其中语素单元是原子或化合物。在POS标记过程中,为每个语素(原子或化合物)分配了POS标记。 POS标记完成后,我们将对复合词素进行后处理,其中每个复合词素将进一步分解为原子词素,这是基于预先分析的模式和从给定标记语料库中获得的广义HMM的结果。实验结果表明了我们提出的方法的前景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号