A Benchmark for Content-Based Retrieval in Bivariate Data Collections

机译：双变量数据收集中基于内容的检索基准

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Huge amounts of various research data are produced and made publicly available in digital libraries. An important category is bivariate data (measurements of one variable versus the other). Examples of bivariate data include observations of temperature and ozone levels (e.g., in environmental observation), domestic production and unemployment (e.g., in economics), or education and income level levels (in the social sciences). For accessing these data, content-based retrieval is an important query modality. It allows researchers to search for specific relationships among data variables (e.g., quadratic dependence of temperature on altitude). However, such retrieval is to date a challenge, as it is not clear which similarity measures to apply. Various approaches have been proposed, yet no benchmarks to compare their retrieval effectiveness have been defined. In this paper, we construct a benchmark for retrieval of bivariate data. It is based on a large collection of bivariate research data. To define similarity classes, we use category information that was annotated by domain experts. The resulting similarity classes are used to compare several recently proposed content-based retrieval approaches for bivariate data, by means of precision and recall. This study is the first to present an encompassing benchmark data set and compare the performance of respective techniques. We also identify potential research directions based on the results obtained for bivariate data. The benchmark and implementations of similarity functions are made available, to foster research in this emerging area of content-based retrieval.

机译：大量的各种研究数据已经产生并在数字图书馆中公开提供。一个重要的类别是双变量数据（一个变量与另一个变量的度量）。双变量数据的示例包括对温度和臭氧水平的观察（例如在环境观察中），国内生产和失业水平（例如在经济学中）或教育和收入水平的观察（在社会科学中）。对于访问这些数据，基于内容的检索是一种重要的查询方式。它使研究人员可以搜索数据变量之间的特定关系（例如，温度对海拔高度的二次依赖性）。但是，迄今为止，这样的检索仍然是一个挑战，因为尚不清楚要应用哪些相似性度量。已经提出了各种方法，但是还没有定义比较它们的检索效果的基准。在本文中，我们构建了用于检索双变量数据的基准。它基于大量的双变量研究数据。为了定义相似性类别，我们使用领域专家注释的类别信息。所得的相似性类别用于通过精度和召回率来比较几种最近提出的基于内容的双变量数据检索方法。这项研究是第一个提出涵盖性基准数据集并比较各种技术性能的研究。我们还将根据双变量数据获得的结果确定潜在的研究方向。提供了相似功能的基准和实现，以促进在基于内容的检索这一新兴领域的研究。

著录项

来源
《International conference on theory and practice of digital libraries》|2012年|286-297|共12页
会议地点
作者
Maximilian Scherer; Tatiana von Landesberger; Tobias Schreck;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
bivariate data; benchmarking; content-based retrieval; feature extraction;

机译：二元数据基准测试;基于内容的检索;特征提取;
入库时间 2022-08-26 15:10:10

相似文献

外文文献
中文文献
专利

1. Content-based video retrieval in historical collections of the German Broadcasting Archive [J] . Markus Muehling, Manja Meister, Nikolaus Korfhage, International journal on digital libraries . 2019,第2期

机译：德国广播档案馆历史馆藏中基于内容的视频检索
2. Content-based video retrieval in historical collections of the German Broadcasting Archive [J] . Markus Muehling, Manja Meister, Nikolaus Korfhage, International journal on digital libraries . 2019,第2期

机译：德国广播档案历史集合中基于内容的视频检索
3. Content-Based Image Retrieval Benchmarking: Utilizing Color Categories and Color Distributions [J] . Egon L. van den Broek, Peter M. F. Kisters, Louis G. Vuurpijl Journal of Imaging Science and Technology . 2005,第3期

机译：基于内容的图像检索基准测试：利用颜色类别和颜色分布
4. A Benchmark for Content-Based Retrieval in Bivariate Data Collections [C] . Maximilian Scherer, Tatiana von Landesberger, Tobias Schreck International Conference on Theory and Practice of Digital Libraries . 2012

机译：基于基于内容的数据收集中的基于基准的基准
5. Database selection in distributed information retrieval: A study of multi-collection information retrieval. [D] . Powell, Allison Lane. 2001

机译：分布式信息检索中的数据库选择：多馆藏信息检索的研究。
6. A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge [O] . Trevor Cohen, Kirk Roberts, Anupama E. Gururaj, 2017

机译：生物医学数据集检索的公开基准：2016 bioCADDIE数据集检索挑战的参考标准
7. A Benchmark for Content-Based Retrieval in Bivariate Data Collections [O] . Maximilian Scherer, Tatiana Von L, Tobias Schreck 2015

机译：基于内容的双变量数据收集检索基准
8. Indexing, Learning and Content-Based Retrieval for Special Purpose Image Databases [R] . Huiskes, M. J., Pauwels, E. J. 2004

机译：基于索引，学习和基于内容的专用图像数据库检索

A Benchmark for Content-Based Retrieval in Bivariate Data Collections

摘要

著录项

相似文献

相关主题

期刊订阅