首页> 外文会议>International conference on knowledge engineering and knowledge management >Automatic Subject Metadata Generation for Scientific Documents Using Wikipedia and Genetic Algorithms
【24h】

Automatic Subject Metadata Generation for Scientific Documents Using Wikipedia and Genetic Algorithms

机译:使用维基百科和遗传算法自动生成科学文献的主题元数据

获取原文

摘要

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents. However, scientific documents that are manually annotated with keyphrases are in the minority. This paper describes a machine learning-based automatic keyphrase annotation method for scientific documents, which utilizes Wikipedia as a thesaurus for candidate selection from documents' content and deploys genetic algorithms to learn a model for ranking and filtering the most probable keyphrases. Reported experimental results show that the performance of our method, evaluated in terms of inter-consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised methods.
机译:使用关键短语对文档进行主题注释是一种证明科研文件主题的行之有效的方法。但是,很少有用关键字手动注释的科学文献。本文介绍了一种基于机器学习的科学文档自动关键词注释方法,该方法利用Wikipedia作为同义词库从文档内容中选择候选者,并部署遗传算法来学习用于对最可能的关键词进行排名和过滤的模型。报道的实验结果表明,根据与人类注释者之间的一致性评估,我们的方法的性能与人类所实现的性能相当,并且优于竞争对手的监督方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号