首页> 外文会议>International Conference on Web Information Systems Engineering >Evaluating Similarity Measures for Dataset Search
【24h】

Evaluating Similarity Measures for Dataset Search

机译:评估数据集搜索的相似度措施

获取原文

摘要

Dataset search engines help scientists to find research datasets for scientific experiments. Current dataset search engines are query-driven, making them limited by the appropriate specification of search queries. An alternative would be to adopt a recommendation paradigm ("if you like this dataset, you'll also like..."). Such a recommendation service requires an appropriate similarity metric between datasets. Various similarity measures have been proposed in computational linguistics and informational retrieval. The goal of this paper is to determine which similarity measure is suitable for a dataset search engine. We will report our experiments on different similarity measures over datasets. We will evaluate these similarity measures against the gold standards which are developed for Elsevier DataSearch, a commercial dataset search engine. With the help of F-measure evaluation measure and nDCG evaluation measure, we find that Wu-Palmer Similarity, a similarity measure which is based on hierarchical terminologies, can score quite good in our benchmarks.
机译:数据集搜索引擎帮助科学家查找科学实验的研究数据集。当前数据集搜索引擎是查询驱动的,使它们受到适当的搜索查询规范的限制。另一种选择是采用推荐范例(“如果您喜欢这个数据集,您也会喜欢......”)。这样的推荐服务需要数据集之间的适当相似度量。在计算语言学和信息检索中提出了各种相似措施。本文的目标是确定哪种相似度措施适用于数据集搜索引擎。我们将在数据集上报告我们的实验。我们将评估这些相似措施,以防止为为ElseVier DataSearch,商业数据集搜索引擎开发的金标准。在F措施评估措施和NDCG评估措施的帮助下,我们发现吴腭相似性,一种基于分层术语的相似性度量,可以在我们的基准中得分非常好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号