首页> 外文会议>International conference on theory and practice of digital libraries >A Benchmark for Content-Based Retrieval in Bivariate Data Collections
【24h】

A Benchmark for Content-Based Retrieval in Bivariate Data Collections

机译:双变量数据收集中基于内容的检索基准

获取原文

摘要

Huge amounts of various research data are produced and made publicly available in digital libraries. An important category is bivariate data (measurements of one variable versus the other). Examples of bivariate data include observations of temperature and ozone levels (e.g., in environmental observation), domestic production and unemployment (e.g., in economics), or education and income level levels (in the social sciences). For accessing these data, content-based retrieval is an important query modality. It allows researchers to search for specific relationships among data variables (e.g., quadratic dependence of temperature on altitude). However, such retrieval is to date a challenge, as it is not clear which similarity measures to apply. Various approaches have been proposed, yet no benchmarks to compare their retrieval effectiveness have been defined. In this paper, we construct a benchmark for retrieval of bivariate data. It is based on a large collection of bivariate research data. To define similarity classes, we use category information that was annotated by domain experts. The resulting similarity classes are used to compare several recently proposed content-based retrieval approaches for bivariate data, by means of precision and recall. This study is the first to present an encompassing benchmark data set and compare the performance of respective techniques. We also identify potential research directions based on the results obtained for bivariate data. The benchmark and implementations of similarity functions are made available, to foster research in this emerging area of content-based retrieval.
机译:大量的各种研究数据已经产生并在数字图书馆中公开提供。一个重要的类别是双变量数据(一个变量与另一个变量的度量)。双变量数据的示例包括对温度和臭氧水平的观察(例如在环境观察中),国内生产和失业水平(例如在经济学中)或教育和收入水平的观察(在社会科学中)。对于访问这些数据,基于内容的检索是一种重要的查询方式。它使研究人员可以搜索数据变量之间的特定关系(例如,温度对海拔高度的二次依赖性)。但是,迄今为止,这样的检索仍然是一个挑战,因为尚不清楚要应用哪些相似性度量。已经提出了各种方法,但是还没有定义比较它们的检索效果的基准。在本文中,我们构建了用于检索双变量数据的基准。它基于大量的双变量研究数据。为了定义相似性类别,我们使用领域专家注释的类别信息。所得的相似性类别用于通过精度和召回率来比较几种最近提出的基于内容的双变量数据检索方法。这项研究是第一个提出涵盖性基准数据集并比较各种技术性能的研究。我们还将根据双变量数据获得的结果确定潜在的研究方向。提供了相似功能的基准和实现,以促进在基于内容的检索这一新兴领域的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号