首页> 外文会议>International KEYSTONE Conference >Keyword Extraction from Parallel Abstracts of Scientific Publications
【24h】

Keyword Extraction from Parallel Abstracts of Scientific Publications

机译:从科学出版物的并行摘要提取关键词提取

获取原文

摘要

In this paper, we study the keyword extraction from parallel of scientific publication in the Serbian and English languages. The keywords are extracted by a selectivity-based keyword extraction method. The method is based on the structural and statistical properties of text represented as a complex network. The constructed parallel corpus of scientific abstracts with annotated keywords allows a better comparison of the performance of the method across languages since we have the controlled experimental environment and data. The achieved keyword extraction results measured with an F1 score are 49.57% for English and 46.73% for the Serbian language, if we disregard keywords that are not present in the abstracts. In case that we evaluate against the whole keyword set, the F1 scores are 40.08% and 45.71% respectively. This work shows that SBKE can be easily ported to new a language, domain and type of text in the sense of its structure. Still, there are drawbacks - the method can extract only the words that appear in the text.
机译:在本文中,我们研究了塞尔维亚语和英语中的科学出版物并行的关键字提取。关键字通过基于选择性的关键字提取方法提取。该方法基于表示作为复杂网络的文本的结构和统计特性。具有注释关键字的科学摘要构造的并行语料库允许更好地比较跨语言的方法的性能,因为我们具有受控的实验环境和数据。通过F1分数测量的达到的关键词提取结果为塞尔维亚语的49.57%,如果我们忽略了摘要中不存在的关键字,则为46.73%。如果我们评估整个关键词集,F1分别分别为40.08%和45.71%。这项工作表明,SBKE可以轻松移植到其结构意义上的新语言,域和文本类型。仍然存在缺点 - 该方法只能提取文本中出现的单词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号