首页> 外文会议>Second conference on machine translation >Automatic Threshold Detection for Data Selection in Machine Translation
【24h】

Automatic Threshold Detection for Data Selection in Machine Translation

机译:机器翻译中数据选择的自动阈值检测

获取原文
获取原文并翻译 | 示例

摘要

We present in this paper the participation of the University of Hamburg in the Biomedical Translation Task of the Second Conference on Machine Translation (WMT 2017). Our contribution lies in adopting a new direction for performing data selection for Machine Translation via Paragraph Vector and a Feed Forward Neural Network Classifier. Continuous distributed vector representations of the sentences are used as features for the binary classifier. Most approaches in data selection rely on scoring and ranking general domain sentences with respect to their similarity to the in-domam and setting a range of thresholds for selecting a percentage of them for training various MT systems. The novelty of our method consists in developing an automatic threshold detection paradigm for data selection which provides an efficient and simple way for selecting the most similar sentences to the m-domain. Encouraging results are obtained using this approach for seven language pairs and four data sets.
机译:我们在本文中介绍汉堡大学参加第二届机器翻译会议(WMT 2017)的生物医学翻译任务。我们的贡献在于采用新的方向来执行通过段落矢量和前馈神经网络分类器进行机器翻译的数据选择。句子的连续分布矢量表示用作二进制分类器的功能。数据选择中的大多数方法都依赖于对通用域句子与内部相似度的评分和排名,并设置阈值范围以选择百分比以训练各种MT系统。我们方法的新颖性在于开发一种用于数据选择的自动阈值检测范例,该范例提供了一种有效且简单的方法来选择与m域最相似的句子。使用这种方法获得的令人鼓舞的结果是七个语言对和四个数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号