首页> 外文会议>International conference on database systems for advanced applications >Towards Order-Preserving SubMatrix Search and Indexing
【24h】

Towards Order-Preserving SubMatrix Search and Indexing

机译:朝向订单保留的Sublatrix搜索和索引

获取原文

摘要

Order-Preserving SubMatrix (OPSM) has been proved to be important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. Given an OPSM query based on row or column keywords, it is desirable to retrieve OPSMs quickly from a large gene expression dataset or OPSM data via indices. However, the time of OPSM mining from gene expression dataset is long and the volume of OPSM data is huge. In this paper, we investigate the issues of indexing two datasets above and first present a naive solution pfTree by applying prefix-Tree. Due to it is not efficient to search the tree, we give an optimization indexing method pIndex. Different from pfTree, plndex employs row and column header tables to traverse related branches in a bottom-up manner. Further, two pruning rules based on number and order of keywords are introduced. To reduce the number of column keyword candidates on fuzzy queries, we introduce a First Item of keywords roTation method FIT, which reduces it from n! to n. We conduct extensive experiments with real datasets on a single machine, Hadoop and Hama, and the experimental results show the efficiency and scalability of the proposed techniques.
机译:已证明订单保留次数(OPSM)在建模生物有意义的子空间集群中,捕获了在条件下捕获基因表达的一般趋势。给定根据行或列关键字的OPSM查询,期望通过索引从大型基因表达数据集或OPSM数据快速检索OPSMS。但是,OPSM挖掘从基因表达数据集的时间长,OPSM数据的体积巨大。在本文中,我们调查了上面的两个数据集的问题,并首先通过应用前缀树提出天真的解决方案pftree。由于搜索树是不高效的,我们提供了优化索引方法PINDEX。与PFTree不同,PLNDEX采用行和列标题表以自下而上的方式遍历相关的分支。此外,介绍了基于数量和关键字顺序的两个修剪规则。为了减少模糊查询的列关键字候选的数量,我们介绍了一个关键字旋转方法适合的第一个项目,从而从n减少它! ñ。我们在一台机器,Hadoop和Hama的实际数据集进行了广泛的实验,实验结果表明了所提出的技术的效率和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号