首页> 外文期刊>ACM transactions on Asian language information processing >Rich Results From Poor Resources: NTCIR-4 Monolingual and Cross-Lingual Retrieval of Korean Texts Using Chinese and English
【24h】

Rich Results From Poor Resources: NTCIR-4 Monolingual and Cross-Lingual Retrieval of Korean Texts Using Chinese and English

机译:资源匮乏带来的丰硕成果:NTCIR-4使用中文和英文进行韩语单语和跨语言检索

获取原文
获取原文并翻译 | 示例
       

摘要

We report on Korean monolingual, Chinese-Korean English-as-pivot bilingual, and Chinese-English bilingual CLIR experiments using MT software augmented with Web-based entity-oriented translation as resources in the NTCIR-4 environment. Simple stemming is helpful in improving bigram indexing for Korean retrieval. For word indexing, keeping nouns only is preferable. Web-based translation reduces untranslated terms left over after MT and substantially improves CLIR results. Translation concatenation is found to consistently improve CLIR effectiveness, while combining a retrieval list from bigram and word indexing is also helpful. A method to disambiguate multiple MT outputs using a log likelihood ratio threshold was tested. Depending on the nature of the title or description queries, bigram only or a retrieval combination, or relaxed or rigid evaluations, direct bilingual CLIR returned an average precision of 71 -79% (English-Korean) and 76-84% (Chinese-English) of the corresponding Korean-Korean and English-English monolingual results. Using English as a pivot in Chinese-Korean CLIR provides about 55-65% the effectiveness that Korean alone does. Entity/terminology translation at the pivot language stage accounts for a large portion of this deficiency. A topic with comparatively worse Chinese-English bilingual result does not necessarily mean that it will continue to under-perform (after further transitive Korean translation) at the Korean retrieval level.
机译:我们报道了使用MT软件和基于Web的面向实体的翻译作为NTCIR-4环境中的资源的韩语单语,汉韩枢轴双语和汉英双语CLIR实验。简单的词干有助于改善韩文检索的双字母组索引。对于单词索引,最好仅保留名词。基于Web的翻译减少了MT之后剩下的未翻译术语,并大大改善了CLIR结果。发现翻译级联可以持续提高CLIR的有效性,而将bigram的检索列表和单词索引结合起来也很有用。测试了使用对数似然比阈值消除多个MT输出歧义的方法。根据标题或描述查询的性质,仅使用bigram或进行检索组合,或者使用宽松或严格的评估,直接双语CLIR返回的平均精度为71 -79%(英语-韩语)和76-84%(中文-英语) )对应的韩韩文和英文-英文单语结果。在中韩CLIR中使用英语作为枢轴,可提供仅韩国人一个人的效果约55-65%。关键语言阶段的实体/术语翻译占了这一缺陷的很大一部分。汉英双语结果相对较差的话题并不一定意味着它会在朝鲜语检索级别上继续表现不佳(经过进一步的朝鲜语翻译之后)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号