首页> 外文OA文献 >DCU-Symantec submission for the WMT 2012 quality estimation task
【2h】

DCU-Symantec submission for the WMT 2012 quality estimation task

机译:DCU-symantec提交的WmT 2012质量评估任务

摘要

This paper describes the features and the machine learning methods used by Dublin City University (DCU) and SYMANTEC for the WMT 2012 quality estimation task. Two sets of features are proposed: one constrained, i.e. respecting the data limitation suggested by the workshop organisers, and one unconstrained, i.e. using data or tools trained on data that was not provided by the workshop organisers. In total, more than 300 features were extracted and used to train classifiers in order to predict the translation quality of unseen data. In this paper, we focus on a subset of our feature set that we consider to be relatively novel: features based on a topic model built using the Latent Dirichlet Allocation approach, and features based on source and target language syntax extracted using part-of-speech (POS) taggers and parsers. We evaluate nine feature combinations using four classification-based and four regression-based machine learning techniques.ud
机译:本文介绍了都柏林城市大学(DCU)和SYMANTEC用于WMT 2012质量评估任务的功能和机器学习方法。提出了两组功能:一组受约束,即遵守研讨会组织者建议的数据限制,另一组不受约束,即使用经过研讨会组织者未提供的数据训练的数据或工具。总共提取了300多个特征,并用于训练分类器,以预测未见数据的翻译质量。在本文中,我们专注于我们认为相对新颖的功能集的子集:基于使用Latent Dirichlet分配方法构建的主题模型的功能,以及基于使用部分语言提取的源语言和目标语言语法的功能语音(POS)标记器和解析器。我们使用四种基于分类和四种基于回归的机器学习技术评估了九种特征组合。 ud

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号