首页> 外文会议>International conference on asian language processing >One-expression classification in Bengali and its role in Bengali-English machine translation
【24h】

One-expression classification in Bengali and its role in Bengali-English machine translation

机译:孟加拉的一次表达分类及其在孟加拉英语机翻译中的作用

获取原文

摘要

This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.
机译:本文试图分析孟加拉的一个表达,并显示其对机器翻译的有效性。在177万字语料库中研究了一个表达的特征。已经提出了分类方案,用于分组一个表达式。识别出往往的分类的功能,并且在包含2006个一个表达式的2006实例的作者生成的注释数据集上培训基于CRF的分类器。分类器的性能在测试集(包含300个孟加拉一表达式)上进行测试,这与培训数据不同。评估表明,分类器可以在75%的情况下正确分类一个表达式。最后,调查了该分类任务的效用以进行机器翻译(孟加拉语)。翻译准确性从39%(Google Transperator)提高到60%(通过所提出的方法),并且发现这种改进是统计上显着的。所有注释的数据集(之前没有)是免费的,以便于进一步研究本主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号