首页> 外文学位 >Distance-based indexing: Observations, applications, and improvements.
【24h】

Distance-based indexing: Observations, applications, and improvements.

机译:基于距离的索引:观察,应用和改进。

获取原文
获取原文并翻译 | 示例

摘要

Multidimensional indexing has long been an active research problem in computer science. Most solutions involve the mapping of complex data types to high-dimensional vectors of fixed length and applying either Spatial Access Methods (SAMs) or Point Access Methods (PAMs) to the vectorized data.; In more recent times, however, this approach has found its limitations. Much of the current data is either difficult to map to a fixed-length vector (such as arbitrary length strings), or maps only successfully to a very high number of dimensions. In both cases, Distance-Based Indexing serves as an attractive alternative, relying only on the pairwise distance information of data items to build indices that offer efficient similarity search retrieval.; In this work, distance-based indexing is approached first in a general fashion, where a framework is laid out that encompasses both distance-based indexing methods as well as SAMs and PAMs. Shared properties of various seemingly unrelated data structures can be exploited, as is shown by the presentation of a single (and optimal) search algorithm that works on a variety of trees for a variety of different search types.; The motivation for distance-based indexing is then shown via an application of indexing strings (biological sequences, to be exact). By simply showing that a distance function satisfies the properties of a metric, it is illustrated that many forms of data, with various distribution characteristics can successfully be indexed with distance-based indexing.; Finally, a probabilistic approach towards indexing leads to an improved tree construction algorithm, as well as an information based search algorithm that searches the information stored in any data structure, regardless of the form (i.e., whether the structure is a tree or a matrix, the algorithm performs equally well).
机译:多维索引长期以来一直是计算机科学中的一个活跃的研究问题。大多数解决方案都涉及将复杂数据类型映射到固定长度的高维向量,并将空间访问方法(SAM)或点访问方法(PAM)应用于矢量化数据。但是,在最近一段时间,这种方法已经发现了其局限性。许多当前数据要么很难映射到固定长度的向量(例如任意长度的字符串),要么很难成功地映射到很多维。在这两种情况下,基于距离的索引都是一种有吸引力的选择,它仅依赖于数据项的成对距离信息来构建提供有效相似搜索检索的索引。在这项工作中,首先以通用方式处理基于距离的索引,在此框架中,提出了一个框架,其中包括基于距离的索引方法以及SAM和PAM。可以利用各种看似不相关的数据结构的共享属性,如单个(最优)搜索算法的呈现所显示的那样,该算法对各种不同搜索类型的各种树都起作用。然后,通过应用索引字符串(准确地说是生物序列)来显示基于距离的索引的动机。通过简单地显示距离函数满足度量标准的属性,可以说明具有各种分布特征的多种形式的数据可以成功地通过基于距离的索引进行索引。最终,一种采用概率方法进行索引的方法会导致改进的树结构算法以及基于信息的搜索算法,该算法搜索存储在任何数据结构中的信息,而不论其形式(即结构是树还是矩阵,该算法的效果同样好)。

著录项

  • 作者

    Tasan, Murat.;

  • 作者单位

    Case Western Reserve University.;

  • 授予单位 Case Western Reserve University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 198 p.
  • 总页数 198
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号