【24h】

Construct Trilingual Parallel Corpus on Demand

机译:按需构建三语种平行语料库

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This paper describes the effort of constructing the Olympic Oriented Trilingual Corpus for the development of NLP applications for Beijing 2008. Designed to support the real NLP applications instead of pure research purpose, this corpus is challenged by multilingual, multi domain and multi system requirements in its construction. The key issue, however, lies in the determination of the proper corpus scale in relation to the time and cost allowed. To solve this problem, this paper proposes to observe the better system performance in the sub-domain than in the whole corpus as the signal of least corpus needed. The hypothesis is that the multi-domain corpus should be sufficient to reveal the domain features at least. So far a Chinese English Japanese tri-lingual corpus totaling 2.4 million words has been accomplished as the first stage result, in which information on domains, locations and topics of the language materials has been annotated in XML.
机译:本文介绍了构建面向北京2008年NLP应用程序的面向奥林匹克的三语种语料库的工作。该语料库旨在支持实际的NLP应用程序而不是纯粹的研究目的,因此在其多语言,多领域和多系统需求方面面临着挑战。施工。但是,关键问题在于确定与所允许的时间和成本有关的适当语料量。为了解决这个问题,本文建议在子域中观察比整个语料库更好的系统性能,这是需要最少语料库的信号。假设是多域语料库至少应足以揭示域特征。到目前为止,作为第一阶段的结果,已经完成了一个总英语单词日语三语料库,共240万个单词,其中以XML注释了语言材料的域,位置和主题的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号