首页> 外文期刊>Natural language engineering >Automatic bilingual lexicon acquisition using random indexing of parallel corpora
【24h】

Automatic bilingual lexicon acquisition using random indexing of parallel corpora

机译:使用并行语料库的随机索引自动双语词典获取

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a very simple and effective approach to using parallel corpora for automatic bilingual lexicon acquisition. The approach, which uses the Random Indexing vector space methodology, is based on finding correlations between terms based on their distributional characteristics. The approach requires a minimum of preprocessing and linguistic knowledge, and is efficient, fast and scalable. In this paper, we explain how our approach differs from traditional cooccurrence-based word alignment algorithms, and we demonstrate how to extract bilingual lexica using the Random Indexing approach applied to aligned parallel data. The acquired lexica are evaluated by comparing them to manually compiled gold standards, and we report overlap of around 60%. We also discuss methodological problems with evaluating lexical resources of this kind.
机译:本文提出了一种非常简单有效的方法来使用并行语料库进行自动双语词典获取。该方法使用随机索引向量空间方法,该方法基于根据词项的分布特征查找词项之间的相关性。该方法需要最少的预处理和语言知识,并且高效,快速且可扩展。在本文中,我们解释了我们的方法与传统的基于同现的单词对齐算法的区别,并演示了如何使用应用于对齐的并行数据的随机索引方法来提取双语词典。通过将获得的词典与手动编制的金标准进行比较来评估获得的词典,我们报告有60%的重叠。我们还将讨论评估此类词汇资源的方法论问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号