首页> 外文会议>International Conference on Natural Language Processing and Knowledge Engineering >Chinese base phrases chunking based on latent semi-CRF model
【24h】

Chinese base phrases chunking based on latent semi-CRF model

机译:基于潜在半CRF模型的汉语基础短语分块

获取原文

摘要

In the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. Instead of rule-based model, we adopt the statistical machine learning method, newly proposed Latent semi-CRF model to solve the Chinese base phrase chunking problem. The Chinese base phrases could be treated as the sequence labeling problem, which involve the prediction of a class label for each frame in an unsegmented sequence. The Chinese base phrases have sub-structures which could not be observed in training data. We propose a latent discriminative model called Latent semi-CRF(Latent Semi Conditional Random Fields), which incorporates the advantages of LDCRF(Latent Dynamic Conditional Random Fields) and semi-CRF that model the sub-structure of a class sequence and learn dynamics between class labels, in detecting the Chinese base phrases. Our results demonstrate that the latent dynamic discriminative model compares favorably to Support Vector Machines, Maximum Entropy Model, and Conditional Random Fields(including LDCRF and semi-CRF) on Chinese base phrases chunking.
机译:在中文自然语言处理领域,识别简单和非递归基本短语是自然语言处理应用程序(例如信息处理和机器翻译)的重要任务。代替基于规则的模型,我们采用统计机器学习方法,新提出的Latent semi-CRF模型来解决中文基础短语分块问题。汉语基本短语可以看作是序列标签问题,涉及对未分段序列中的每个帧的类别标签的预测。汉语基本短语具有在训练数据中无法观察到的子结构。我们提出了一种潜在的判别模型,称为潜在半条件CRF(潜在半条件随机场),该模型结合了LDCRF(潜在动态条件随机场)和半CRF的优势,该模型对类序列的子结构进行建模并学习之间的动态关系。类标签,用于检测中文基础短语。我们的研究结果表明,潜在的动态判别模型在中文基本短语分词方面优于支持向量机,最大熵模型和条件随机字段(包括LDCRF和semi-CRF)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号