Generating Translation Corpora in Indic Languages: Cultivating Bilingual Texts for Cross Lingual Fertilization

机译：生成印度语翻译语料库：培养双语文本以进行跨语言受精

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We address some theoretical and practical issues relating to generation, processing, and management of Translation Corpus (TC) in Indian languages, which is developed in a consortium-mode project (ILCI-II) under the DeitY, Govt. of India. Issues are discussed here for the first time keeping in mind the ready application of TC in various domains of computational and applied linguistics. We first define what is a TC; describe the process of its construction; identify its features; exemplify the processes of text alignment in TC; discuss methods of text analysis; propose for restructuring of translational units; define the process of extraction of translational equivalents; propose for generating bilingual lexical database and TermBank from a structured TC; and finally identify areas where a TC and information extracted from it may be utilized. Since construction of TC in Indian languages is full of hurdles, we try to construct a roadmap with a focus on techniques and methodologies that may be applied for achieving the task. The issues are brought under focus to justify the work that generated TC for some Indian languages for future reference and application.

机译：我们将解决与印度语翻译语料库（TC）的生成，处理和管理有关的一些理论和实践问题，这是在印度政府DeitY下的财团模式项目（ILCI-II）中开发的。印度。在此首次讨论问题，同时要牢记TC在计算语言和应用语言学的各个领域中的现成应用。我们首先定义什么是TC;描述其建造过程;识别其特征;举例说明TC中的文本对齐过程;讨论文本分析方法;提议重组翻译单位;定义翻译等价物的提取过程;提议从结构化技术委员会生成双语词汇数据库和TermBank;最后确定可以利用TC和从中提取的信息的区域。由于印度语言的TC建设充满障碍，因此我们尝试构建路线图，重点关注可用于完成任务的技术和方法。关注这些问题是为了证明为某些印度语言生成TC的工作是合理的，以备将来参考和应用。

著录项

来源
《International conference on natural language processing》|2015年|329-338|共10页
会议地点
作者
Niladri Sekhar Dash; Arulmozi Selvraj; Mazhar Hussain;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Language model adaptation in Tamil language using cross-lingual latent semantic analysis with document aligned corpora [J] . Selvam M., Natarajan A. M. Current Science: A Fortnightly Journal of Research . 2010,第7期

机译：使用跨语言潜在语义分析和文档对齐语料库对泰米尔语语言模型进行适应
2. Language model adaptation in Tamil language using cross-lingual latent semantic analysis with document aligned corpora [J] . Natarajan A. M., Selvam M. Current science . 2010,第07期

机译：使用跨语言潜在语义分析和文档对齐语料库对泰米尔语语言模型进行适应
3. Language model adaptation in Tamil language using cross-lingual latent semantic analysis with document aligned corpora [J] . Natarajan A. M., Selvam M. Current science . 2010,第07期

机译：使用跨语言潜在语义分析和文档对齐语料库对泰米尔语语言模型进行适应
4. Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval [C] . Fatiha Sadat, Masatoshi Yoshikawa, Shunsuke Uemura Proceedings of the Student Research Workshop, Interactive Posters/Demonstrations, and Tutorial Abstracts . 2003

机译：从可比语料库和双语翻译到跨语言信息检索的双语术语习得
5. Lost and Found in Translation: Cross-Lingual Question Answering with Result Translation. [D] . Parton, Kristen. 2012

机译：翻译中的失物招领：带有结果翻译的跨语言问答。
6. Self-ratings of Spoken Language Dominance: A Multi-Lingual Naming Test (MINT) and Preliminary Norms for Young and Aging Spanish-English Bilinguals [O] . Tamar H. Gollan, Gali H. Weissberger, Elin Runnqvist, -1

机译：口语级联的自我评级：一种多语言命名试验（薄荷）和年轻和老化西班牙语 - 英语双语的初步规范
7. Learning bilingual translations from comparable corpora to cross-language information retrieval [O] . Fatiha Sadat, Masatoshi Yoshikawa, Shunsuke Uemura 2003

机译：从可比较的Corpora学习双语翻译到跨语言信息检索

Generating Translation Corpora in Indic Languages: Cultivating Bilingual Texts for Cross Lingual Fertilization

摘要

著录项

相似文献

相关主题

期刊订阅