An analysis and comparison of keyword recommendation methods for scientific data

Youichi Ishida; Toshiyuki Shimizu; Masatoshi Yoshikawa

首页> 外文期刊>International journal on digital libraries >An analysis and comparison of keyword recommendation methods for scientific data

【24h】

An analysis and comparison of keyword recommendation methods for scientific data

机译：科学数据的关键字推荐方法的分析与比较

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

To classify and search various kinds of scientific data, it is useful to annotate those data with keywords from a controlled vocabulary. Data providers, such as researchers, annotate their own data with keywords from the provided vocabulary. However, for the selection of suitable keywords, extensive knowledge of both the research domain and the controlled vocabulary is required. Therefore, the annotation of scientific data with keywords from a controlled vocabulary is a time-consuming task for data providers. In this paper, we discuss methods for recommending relevant keywords from a controlled vocabulary for the annotation of scientific data through their metadata. Many previous studies have proposed approaches based on keywords in similar existing metadata; we call this the indirect method. However, when the quality of the existing metadata set is insufficient, the indirect method tends to be ineffective. Because the controlled vocabularies for scientific data usually provide definition sentences for each keyword, it is also possible to recommend keywords based on the target metadata and the keyword definitions; we call this the direct method. The direct method does not utilize the existing metadata set and therefore is independent of its quality. Also, for the evaluation of keyword recommendation methods, we propose evaluation metrics based on a hierarchical vocabulary structure, which is a distinctive feature of most controlled vocabularies. Using our proposed evaluation metrics, we can evaluate keyword recommendation methods with an emphasis on keywords that are more difficult for data providers to select. In experiments using real earth science datasets, we compare the direct and indirect methods to verify their effectiveness, and observe how the indirect method depends on the quality of the existing metadata set. The results show the importance of metadata quality in recommending keywords.

机译：要对各种科学数据进行分类和搜索各种科学数据，请使用来自受控词汇表的关键字注释这些数据是有用的。数据提供者（例如研究人员）向您自己的数据带有来自提供的词汇的关键字。但是，对于选择合适的关键字，需要对研究领域和受控词汇的广泛知识。因此，来自受控词汇的与关键词的科学数据注释是数据提供商的耗时任务。在本文中，我们讨论了建议通过其元数据注释科学数据的受控词汇的相关关键字的方法。以前的许多研究已经基于类似现有元数据的关键字提出了方法;我们称之为间接方法。然而，当现有元数据集的质量不足时，间接方法往往是无效的。由于用于科学数据的受控词汇等，通常为每个关键字提供定义句子，因此还可以基于目标元数据和关键字定义推荐关键字;我们称之为直接方法。直接方法不利用现有的元数据集，因此与其质量无关。此外，对于评估关键字推荐方法，我们提出了基于分层词汇结构的评估度量，这是大多数受控词汇表的独特特征。使用我们提出的评估指标，我们可以评估关键字推荐方法，重点是对数据提供商选择更困难的关键字。在使用真实地球科学数据集的实验中，我们比较直接和间接的方法来验证它们的效力，并观察间接方法如何取决于现有元数据集的质量。结果表明了元数据质量在推荐关键字中的重要性。

著录项

来源
《International journal on digital libraries》 |2020年第3期|307-327|共21页
作者
Youichi Ishida; Toshiyuki Shimizu; Masatoshi Yoshikawa;
展开▼
作者单位

Graduate School of Informatics Kyoto University Kyoto 606-8501 Japan;

Graduate School of Informatics Kyoto University Kyoto 606-8501 Japan;

Graduate School of Informatics Kyoto University Kyoto 606-8501 Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Keyword recommendation; Metadata quality; Controlled vocabulary; Keyword definition;

机译：关键词推荐;元数据质量;控制词汇;关键字定义;
入库时间 2022-08-18 21:21:28

相似文献

外文文献
中文文献
专利

1. KASR: A Keyword-Aware Service Recommendation Method on MapReduce for Big Data Applications [J] . Meng S., Dou W., Zhang X., Parallel and Distributed Systems, IEEE Transactions on . 2014,第12期

机译：KASR：MapReduce上用于大数据应用程序的关键字感知服务推荐方法
2. Ontology Based Scientific Keywords Recommendation System under Web 2.0 [J] . Na Xue, Su ling Jia, Jin xing Hao, International Journal of Emerging Technologies in Learning (iJET) . 2013,第4期

机译：Web 2.0下基于本体的科学关键词推荐系统
3. Ontology Based Scientific Keywords Recommendation System under Web 2.0 [J] . Na Xue, Su ling Jia, Jin xing Hao, International Journal of Emerging Technologies in Learning (iJET) . 2013,第4期

机译：Web 2.0下基于本体的科学关键词推荐系统
4. A Keyword-Based Big Data Analysis for Individualized Health Activity Using Keyword Analysis Technique: A Methodological Approach Using National Health Data [C] . SangDo Lee, Hoanh-Su Le, Jun-Ho Huh International conference on computer science and it applications . 2018

机译：使用关键字分析技术的个性化健康活动的基于关键字的大数据分析：一种使用国家健康数据的方法方法
5. Improving Intelligent Analytics through Guidance: Analysis and Refinement of Patterns of Use and Recommendation Methods for Data Mining and Analytics Systems [D] . Pate, Jeremy R. 2019

机译：通过指导改进智能分析：数据挖掘和分析系统的使用模式分析和改进以及推荐方法
6. Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [O] . Gavin B. Stewart, Douglas G. Altman, Lisa M. Askie, -1

机译：个别参与者数据的荟萃分析的统计分析：方法和建议实践的比较
7. An analysis and comparison of keyword recommendation methods for scientific data [O] . Youichi Ishida, Toshiyuki Shimizu, Masatoshi Yoshikawa 2020

机译：科学数据的关键字推荐方法的分析与比较
8. Documentation of Methods and Inventory of Irrigation Data Collected for the 2000 and 2005 U.S. Geological Survey Estimated Use of Water in the United States, Comparison of USGS-Compiled Irrigation Data to Other Sources, and Recommendations for Future Comp [R] . Dickens, J. M., Forbes, B. T., Cobean, D. S., 2012

机译：2000年和2005年美国地质调查局收集的灌溉数据方法和清单文件记录在美国估算的用水量，UsGs编制的灌溉数据与其他来源的比较，以及对未来建议的建议

An analysis and comparison of keyword recommendation methods for scientific data

摘要

著录项

相似文献

相关主题

期刊订阅