首页> 外文会议>International conference on recent advances in natural language processing >Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction Pipeline for Under-Resourced Languages
【24h】

Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction Pipeline for Under-Resourced Languages

机译:开发和评估资源匮乏语言的受监管的多语言关键字提取管道

获取原文

摘要

This paper evaluates different techniques for building a supervised, multilanguage keyphrase extraction pipeline for languages which lack a gold standard. Starting from an unsupervised English keyphrase extraction pipeline, we implement pipelines for Arabic, Italian, Portuguese, and Romanian, and we build test collections for languages which lack one. Then, we add a Machine Learning module trained on a well-known English language corpus and we evaluate the performance not only over English but on the other languages as well. Finally, we repeat the same evaluation after training the pipeline over an Arabic language corpus to check whether using a language-specific corpus brings a further improvement in performance. On the five languages we analyzed, results show an improvement in performance when using a machine learning algorithm, even if such algorithm is not trained and tested on the same language.
机译:本文评估了针对缺乏黄金标准的语言构建有监督的多语言关键字短语提取管道的不同技术。从无监督的英语按键短语提取管道开始,我们实现阿拉伯语,意大利语,葡萄牙语和罗马尼亚语的管道,并为缺少一种语言的语言建立测试集合。然后,我们添加了在著名的英语语料库上训练的机器学习模块,我们不仅评估了英语的性能,还评估了其他语言的性能。最后,我们在训练阿拉伯语料库的流水线以检查使用特定于语言的语料库是否会进一步提高性能后,重复进行相同的评估。在我们分析的五种语言中,即使没有使用相同的语言训练和测试该算法,结果也表明使用机器学习算法时性能得到了改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号