首页> 外文会议>International Conference on Research, Innovation and Vision for the Future >Automatic Construction of English-Vietnamese Parallel Corpus through Web Mining
【24h】

Automatic Construction of English-Vietnamese Parallel Corpus through Web Mining

机译:通过网挖掘自动施工英语 - 越南平行语料库

获取原文

摘要

Parallel corpus has become a very essential resource for multilingual natural language processing and there are large scale of parallel texts available on the internet these days. In this paper, we propose a simple but reliable method to construct an English-Vietnamese parallel corpus through web mining. Our system can automatically download and detect parallel web pages on a given domain to construct a parallel corpus that is well-aligned at paragraph level with completely clean texts. The proposed technique can be easily applied to other language pairs. Experiments have been made and shown promising results.
机译:并行语料库已成为多语言自然语言处理的非常重要的资源,这些天在互联网上有大规模的并行文本。在本文中,我们提出了一种简单但可靠的方法来通过网挖来构建英语 - 越南平行语料库。我们的系统可以自动下载并检测给定域上的并行网页,以构建并行语料库,该语料库在段落级别与完全清洁的文本进行典型。所提出的技术可以很容易地应用于其他语言对。已经进行了实验并显示了有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号