首页> 外文会议>43rd Annual Meeting of the Association for Computational Linguistics: Proceeding of the Conference >Training Data Modification for SMT Considering Groups of Synonymous Sentences
【24h】

Training Data Modification for SMT Considering Groups of Synonymous Sentences

机译:考虑同义句组的SMT训练数据修改

获取原文

摘要

Generally speaking, statistical machinetranslation systems would be able to attainbetter performance with more training sets.Unfortunately, well-organized training setsare rarely available in the real world. Consequently,it is necessary to focus on modifyingthe training set to obtain highaccuracy for an SMT system. If the SMTsystem trained the translation model, thetranslation pair would have a low probabilitywhen there are many variations for targetsentences from a single source sentence.If we decreased the number of variationsfor the translation pair, we could constructa superior translation model. This paper describesthe effects of modification on thetraining corpus when consideration is givento synonymous sentence groups. We attemptthree types of modification: compressionof the training set, replacement ofsource and target sentences with a selectedsentence from the synonymous sentencegroup, and replacement of the sentence ononly one side with the selected sentencefrom the synonymous sentence group. As aresult, we achieve improved performancewith the replacement of source-side sentences.
机译:一般来说,统计机 翻译系统将能够实现 通过更多训练集获得更好的性能 不幸的是,训练有素的训练集 在现实世界中很少有。所以, 有必要专注于修改 获得高水平的训练 SMT系统的准确性。如果是SMT 系统训练了翻译模型, 翻译对的可能性很小 当目标有很多变化时 单个来源句子中的句子。 如果我们减少变体的数量 对于翻译对,我们可以构建 出色的翻译模型。本文介绍 修改对 考虑时的训练语料库 同义词组。我们尝试 三种类型的修改:压缩 训练集的替换 选定句子的来源和目标句子 同义句子中的句子 组,并替换上的句子 所选句子只有一侧 来自同义句子组。作为一个 结果,我们提高了性能 用源句代替。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号