首页> 外文会议>Workshop on multiword expressions: from theory to application. >Mining Large-sca e Comparab e Corpora from Chinese-Eng ish News Co ections
【24h】

Mining Large-sca e Comparab e Corpora from Chinese-Eng ish News Co ections

机译:从汉英新闻集中挖掘大型可比语料库

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we explore a CLIR based approach to constr ct large scale Chi nese English comparable corpora, which is val able for translation knowledge mining. The initial so rce and target doc ment sets are crawled from news website and standardized niformly. Keywords are extracted from the so rce doc ment firstly, and then the extracted keywords are translated and combined as q ery words thro gh certain criteria to retrieve against the index created sing target doc ment set. Meanwhile, the mapping correlations between so rce and target doc ments are developed accord ing to the val e of similarity calc lated by the retrieval tool. Two methods are eval ated to filter the comparable doc ment pairs so as to ens re the q ality of the comparable corpora. Experimental re s lts indicate that o r approach is effec tive on the constr ction of Chinese English comparable corpora.
机译:在本文中,我们探索了一种基于CLIR的方法来构建大规模的中文英语可比语料库,这对于翻译知识的挖掘非常有用。最初的文档和目标文档集是从新闻网站抓取的,并且经过了标准化的标准化。首先从源文档中提取关键词,然后将提取的关键词翻译为特定条件下的特定单词,并根据创建的索引目标文档集进行检索。同时,根据检索工具计算出的相似度值,建立了目标文件与目标文件之间的映射关系。评估了两种方法来过滤可比文档对,以确保可比语料的质量。实验结果表明,该方法对于构建中文英语可比语料库是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号