首页> 外文会议>International conference on natural language processing >Generating Translation Corpora in Indic Languages: Cultivating Bilingual Texts for Cross Lingual Fertilization
【24h】

Generating Translation Corpora in Indic Languages: Cultivating Bilingual Texts for Cross Lingual Fertilization

机译:生成印度语翻译语料库:培养双语文本以进行跨语言受精

获取原文

摘要

We address some theoretical and practical issues relating to generation, processing, and management of Translation Corpus (TC) in Indian languages, which is developed in a consortium-mode project (ILCI-II) under the DeitY, Govt. of India. Issues are discussed here for the first time keeping in mind the ready application of TC in various domains of computational and applied linguistics. We first define what is a TC; describe the process of its construction; identify its features; exemplify the processes of text alignment in TC; discuss methods of text analysis; propose for restructuring of translational units; define the process of extraction of translational equivalents; propose for generating bilingual lexical database and TermBank from a structured TC; and finally identify areas where a TC and information extracted from it may be utilized. Since construction of TC in Indian languages is full of hurdles, we try to construct a roadmap with a focus on techniques and methodologies that may be applied for achieving the task. The issues are brought under focus to justify the work that generated TC for some Indian languages for future reference and application.
机译:我们将解决与印度语翻译语料库(TC)的生成,处理和管理有关的一些理论和实践问题,这是在印度政府DeitY下的财团模式项目(ILCI-II)中开发的。印度。在此首次讨论问题,同时要牢记TC在计算语言和应用语言学的各个领域中的现成应用。我们首先定义什么是TC;描述其建造过程;识别其特征;举例说明TC中的文本对齐过程;讨论文本分析方法;提议重组翻译单位;定义翻译等价物的提取过程;提议从结构化技术委员会生成双语词汇数据库和TermBank;最后确定可以利用TC和从中提取的信息的区域。由于印度语言的TC建设充满障碍,因此我们尝试构建路线图,重点关注可用于完成任务的技术和方法。关注这些问题是为了证明为某些印度语言生成TC的工作是合理的,以备将来参考和应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号