首页> 外文会议>China national conference on computational linguistics;International symposium on natural language processing based on naturally annotated big data >Integrating Multi-source Bilingual Information for Chinese Word Segmentation in Statistical Machine Translation
【24h】

Integrating Multi-source Bilingual Information for Chinese Word Segmentation in Statistical Machine Translation

机译:统计机器翻译中集成多源双语信息的中文分词

获取原文

摘要

Chinese texts are written without spaces between the words, which is problematic for Chinese-English statistical machine translation (SMT). The most widely used approach in existing SMT systems is apply a fixed segmentations produced by the off-the-shelf Chinese word segmentation (CWS) systems to train the standard translation model. Such approach is sub-optimal and unsuitable for SMT systems. We propose a joint model to integrate the multi-source bilingual information to optimize the segmentations in SMT. We also propose an unsuper-vised algorithm to improve the quality of the joint model iteratively. Experiments show that our method improve both segmentation and translation performance in different data environment.
机译:中文文本的单词之间没有空格,这对汉英统计机器翻译(SMT)来说是个问题。现有SMT系统中使用最广泛的方法是应用由现成的中文分词(CWS)系统产生的固定分词来训练标准翻译模型。这种方法不是最优的,并且不适用于SMT系统。我们提出了一个联合模型,以集成多源双语信息以优化SMT中的细分。我们还提出了一种无监督算法来迭代地提高联合模型的质量。实验表明,该方法在不同数据环境下均能提高分割效果和翻译性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号