首页> 外文期刊>Computer speech and language >A simplification-translation-restoration framework for domain adaptation in statistical machine translation: A case study in medical record translation
【24h】

A simplification-translation-restoration framework for domain adaptation in statistical machine translation: A case study in medical record translation

机译:统计机器翻译中域自适应的简化翻译恢复框架:以病历翻译为例

获取原文
获取原文并翻译 | 示例

摘要

Integration of in-domain knowledge into an out-of-domain statistical machine translation (SMT) system poses challenges due to the lack of resources. Lack of in-domain bilingual corpora is one such issue. In this paper, we propose a simplification-translation-restoration (STR) framework for domain adaptation in SMT systems. An SMT system to translate medical records from English to Chinese is taken as a case study. We identify the critical segments in a medical sentence and simplify them to alleviate the data sparseness problem in the out-of-domain SMT system. After translating the simplified sentence, the translations of these critical segments are restored to their proper positions. Besides the simplification pre-processing step and the restoration postprocessing step, we also enhance the translation and language models in the STR framework by using pseudo bilingual corpora generated by the background MT system. In the experiments, we adapt an SMT system from a government document domain to a medical record domain. The results show the effectiveness of the STR framework.
机译:由于缺乏资源,将域内知识集成到域外统计机器翻译(SMT)系统中带来了挑战。缺乏域内双语语料库就是此类问题之一。在本文中,我们提出了一种用于SMT系统中域自适应的简化翻译恢复(STR)框架。以将医疗记录从英语翻译为中文的SMT系统为例。我们确定医学句子中的关键部分,并简化它们以减轻域外SMT系统中的数据稀疏性问题。翻译完简化的句子后,这些关键句的翻译将恢复到适当的位置。除了简化的预处理步骤和还原的后处理步骤,我们还使用后台MT系统生成的伪双语语料库增强了STR框架中的翻译和语言模型。在实验中,我们将SMT系统从政府文档领域调整为病历领域。结果显示了STR框架的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号