首页> 外文期刊>International journal on digital libraries >An analysis and comparison of keyword recommendation methods for scientific data
【24h】

An analysis and comparison of keyword recommendation methods for scientific data

机译:科学数据的关键字推荐方法的分析与比较

获取原文
获取原文并翻译 | 示例
       

摘要

To classify and search various kinds of scientific data, it is useful to annotate those data with keywords from a controlled vocabulary. Data providers, such as researchers, annotate their own data with keywords from the provided vocabulary. However, for the selection of suitable keywords, extensive knowledge of both the research domain and the controlled vocabulary is required. Therefore, the annotation of scientific data with keywords from a controlled vocabulary is a time-consuming task for data providers. In this paper, we discuss methods for recommending relevant keywords from a controlled vocabulary for the annotation of scientific data through their metadata. Many previous studies have proposed approaches based on keywords in similar existing metadata; we call this the indirect method. However, when the quality of the existing metadata set is insufficient, the indirect method tends to be ineffective. Because the controlled vocabularies for scientific data usually provide definition sentences for each keyword, it is also possible to recommend keywords based on the target metadata and the keyword definitions; we call this the direct method. The direct method does not utilize the existing metadata set and therefore is independent of its quality. Also, for the evaluation of keyword recommendation methods, we propose evaluation metrics based on a hierarchical vocabulary structure, which is a distinctive feature of most controlled vocabularies. Using our proposed evaluation metrics, we can evaluate keyword recommendation methods with an emphasis on keywords that are more difficult for data providers to select. In experiments using real earth science datasets, we compare the direct and indirect methods to verify their effectiveness, and observe how the indirect method depends on the quality of the existing metadata set. The results show the importance of metadata quality in recommending keywords.
机译:要对各种科学数据进行分类和搜索各种科学数据,请使用来自受控词汇表的关键字注释这些数据是有用的。数据提供者(例如研究人员)向您自己的数据带有来自提供的词汇的关键字。但是,对于选择合适的关键字,需要对研究领域和受控词汇的广泛知识。因此,来自受控词汇的与关键词的科学数据注释是数据提供商的耗时任务。在本文中,我们讨论了建议通过其元数据注释科学数据的受控词汇的相关关键字的方法。以前的许多研究已经基于类似现有元数据的关键字提出了方法;我们称之为间接方法。然而,当现有元数据集的质量不足时,间接方法往往是无效的。由于用于科学数据的受控词汇等,通常为每个关键字提供定义句子,因此还可以基于目标元数据和关键字定义推荐关键字;我们称之为直接方法。直接方法不利用现有的元数据集,因此与其质量无关。此外,对于评估关键字推荐方法,我们提出了基于分层词汇结构的评估度量,这是大多数受控词汇表的独特特征。使用我们提出的评估指标,我们可以评估关键字推荐方法,重点是对数据提供商选择更困难的关键字。在使用真实地球科学数据集的实验中,我们比较直接和间接的方法来验证它们的效力,并观察间接方法如何取决于现有元数据集的质量。结果表明了元数据质量在推荐关键字中的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号