首页> 外文会议>Knowledge-Based Systems for Safety Critical Applications >Indexing weighted-sequences in large databases
【24h】

Indexing weighted-sequences in large databases

机译:索引大型数据库中的加权序列

获取原文
获取原文并翻译 | 示例

摘要

We present an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure where each element in the sequence is associated with a weight. A series of network events, for instance, is a weighted-sequence in that each event has a timestamp. Querying a large sequence database by events' occurrence patterns is a first step towards understanding the temporal causal relationships among the events. The index structure proposed enables us to efficiently retrieve from the database all subsequences, possibly noncontiguous, that match a given query sequence both by events and by weights. The index method also takes into consideration the nonuniformfrequency distribution of events in the sequence data. In addition, our method finds a broad range of applications in indexing scientific data consisting of multiple numerical columns for discovery of correlations among these columns. For instance, indexing a DNA microarray that records expression levels of genes under different conditions enables us to search for genes whose responses to various experimental perturbations follow a given pattern. We demonstrate, using real-world data sets, that our method is effective and efficient.
机译:我们提出了一种索引结构,用于管理大型数据库中的加权序列。加权序列定义为二维结构,其中序列中的每个元素都与权重相关联。例如,一系列网络事件是加权序列,因为每个事件都有一个时间戳。通过事件的发生模式查询大型序列数据库是了解事件之间的时间因果关系的第一步。所提出的索引结构使我们能够从数据库中有效地检索所有可能不连续的子序列,这些子序列既可以通过事件又可以通过权重匹配给定查询序列。索引方法还考虑了序列数据中事件的非均匀频率分布。此外,我们的方法在索引由多个数字列组成的科学数据以发现这些列之间的相关性方面具有广泛的应用。例如,索引记录了在不同条件下基因表达水平的DNA微阵列,使我们能够搜索对各种实验扰动的响应遵循给定模式的基因。我们使用现实世界的数据集证明了我们的方法是有效和高效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号