首页> 外文会议>International conference on computational linguistics >Mining Large-sca e Comparab e Corpora from Chinese-Eng ish News Co ections
【24h】

Mining Large-sca e Comparab e Corpora from Chinese-Eng ish News Co ections

机译:矿业大型SCA E比较来自中国英语新闻连接

获取原文

摘要

In this paper, we explore a CLIR based approach to constr ct large scale Chi nese English comparable corpora, which is val able for translation knowledge mining. The initial so rce and target doc ment sets are crawled from news website and standardized niformly. Keywords are extracted from the so rce doc ment firstly, and then the extracted keywords are translated and combined as q ery words thro gh certain criteria to retrieve against the index created sing target doc ment set. Meanwhile, the mapping correlations between so rce and target doc ments are developed accord ing to the val e of similarity calc lated by the retrieval tool. Two methods are eval ated to filter the comparable doc ment pairs so as to ens re the q ality of the comparable corpora. Experimental re s lts indicate that o r approach is effec tive on the constr ction of Chinese English comparable corpora.
机译:在本文中,我们探讨了基于CLIR的CT大规模Chi Neese英语比较的方法,这是瓦尔能够进行翻译知识挖掘。初始所以RCE和目标DOC MET集可以从新闻网站爬出并标准化。首先从SO RCE Doc Ment中提取关键字,然后将提取的关键字转换并将其组合为Q ery单词Thro GH某些标准来检索对索引创建的Sing目标Doc Ment集合。同时,根据检索工具的相似性计算的Val E开发了所以RCE和目标DOC分子之间的映射相关性。两种方法是用于过滤相可的DOC分对的评估,以便为可比较的语料库的Q为Q而获得。实验RES LTS表明,o r方法对中国英语比较Corpora的Contric CTION有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号