首页> 外文期刊>Research journal of applied science, engineering and technology >Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair
【24h】

Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair

机译:英文-印地语对的域偏向双语并行数据提取及其句级对齐

获取原文
获取原文并翻译 | 示例
           

摘要

Creation of Parallel Corpora and efficient corporal alignment at sentential level for structurally distinct languages having relatively low degree of correlation remains a challenge. This work emphasizes the importance of domain biased parallel data collection and a structured methodology to obtain the same for English-Hindi language duet. Further, its sentential alignment has also been undertaken since the participating languages are structurally distinct, hi essence two aspects of this study is collection of parallel corpora from different domains and aligning the extracted parallel corpus at sentence level. The proposition is intended to help researchers in the field of Natural Language Processing help contribute better in terms of accuracy, precision and robustness of their proposition. This being possible only with availability of abundant parallel corpora and more so only if the parallel corpora are available domain wise and aligned at least at sentence level. The language pair considered for the development of the algorithm is English-Hindi. The algorithm being generic in nature makes our proposition scalable to other like structured language pairs.
机译:对于具有相对较低相关度的结构上不同的语言,创建平行语料库以及在句子级别进行有效的语料对齐仍然是一个挑战。这项工作强调了域偏向并行数据收集的重要性以及采用结构化的方法来获取英语-印地语二重唱的重要性。此外,由于参与语言在结构上是不同的,因此也进行了其句子对齐。实质上,本研究的两个方面是收集来自不同领域的并行语料库,并在句子级别对齐提取的并行语料库。该命题旨在帮助自然语言处理领域的研究人员在命题的准确性,准确性和鲁棒性方面做出更好的贡献。这只有在有大量并行语料库的情况下才有可能实现,而只有在并行语料库在领域上可用并且至少在句子级别对齐时,才有可能。用于算法开发的语言对是英语-印地语。该算法本质上是通用的,因此我们的主张可以扩展到其他类似结构化语言对。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号