【24h】

ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model

机译:ArbEngVec:阿拉伯语-英语跨语言单词嵌入模型

获取原文

摘要

Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (1R) and Information Extraction (IE) are among such areas. In this paper, we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences. In addition, we perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.
机译:词嵌入(WE)因其在捕获词的语义特性方面的有效性而变得越来越流行并广泛应用于许多自然语言处理(NLP)应用程序中。机器翻译(MT),信息检索(1R)和信息提取(IE)就是这些领域。在本文中,我们提出了一个开源ArbEngVec,它提供了几种阿拉伯语-英语跨语言单词嵌入模型。为了训练我们的双语模型,我们使用了一个大型数据集,其中包含超过9300万对阿拉伯语-英语平行句子。此外,我们对不同的词嵌入模型变体进行外部和内部评估。外在评估会评估跨语言语义文本相似性(STS)上模型的性能,而内在评估则基于单词翻译(WT)任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号