首页> 外文会议>China National Conference on Computational Linguistics >Semi-supervised Learning for Mongolian Morphological Segmentation
【24h】

Semi-supervised Learning for Mongolian Morphological Segmentation

机译:半监督蒙古形态细分学习

获取原文

摘要

Unlike previous Mongolian morphological segmentation methods based on large labeled training data or complicated rules concluded by linguists, we explore a novel semi-supervised method for a practical application, i.e., statistical machine translation (SMT), based on a low-resource learning setting, in which a small amount of labeled data and large amount of unlabeled data are available. First, a CRF-based supervised learning is exploited to predict morpheme boundaries by using small labeled data. Then, a lexicon-based segmentation model with small labeled data as the heuristic information is used to compensate the weakness in the first step by the abundant unlabeled data. Finally, we present some error correction models to revise segmentation results. Experimental results show that our method can improve the segmentation results compared with the pure supervised learning. Besides, we integrate the morphological segmentation result into Chinese-Mongolian SMT and achieve the satisfactory performance compared with the baseline.
机译:与基于语言学家得出的大型标签培训数据或复杂规则的先前蒙古形态分割方法不同,我们探索了一种基于低资源学习环境的实际应用,即统计机器翻译(SMT)的新型半监督方法,其中可以使用少量标记数据和大量的未标记数据。首先,利用基于CRF的监督学习来通过使用小标记数据来预测语素边界。然后,使用具有小标记数据的基于词汇的分割模型作为启发式信息,用于通过丰富的未标记数据来补偿第一步中的弱点。最后,我们提出了一些错误校正模型来修改分段结果。实验结果表明,与纯粹的监督学习相比,我们的方法可以改善分割结果。此外,与基线相比,我们将形态分割结果整合到蒙古SMT中,实现了令人满意的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号