【24h】

Querying Highly Similar Structured Sequences via Binary Encoding and Word Level Operations

机译:通过二进制编码和字级运算查询高度相似的结构化序列

获取原文

摘要

In the post-genomic era there has been an explosion in the amount of genomic data available and the primary research problems have moved from being able to produce interesting biological data to being able to efficiently process and store this information. In this paper we present efficient data structures and algorithms for the HIGH Similarity Sequencing Problem. In the High Similarity Sequencing PROBLEM we are given the sequences S_0, S_1,...,S_k where S_j = e_(j_1) I_(σ_1) e_(j_2) I_(σ_2) e_(j_2) I_(σ_3) ,..,e_(j_e) I_(σ_e) and must perform pattern matching on the set of sequences. In this paper we present time and memory efficient datastructures by exploiting their extensive similarity, our solution leads to a query time of O(m + vk log e+ ((mocc_v v)/w+(PSC(p)m)/w) with a memory usage of O(N log N + vk log vk).
机译:在后基因组时代,可用的基因组数据数量激增,主要的研究问题已从能够产生有趣的生物学数据转变为能够有效处理和存储该信息。在本文中,我们提出了针对高相似性排序问题的有效数据结构和算法。在高相似性排序问题中,我们得到了序列S_0,S_1,...,S_k,其中S_j = e_(j_1)I_(σ_1)e_(j_2)I_(σ_2)e_(j_2)I_(σ_3)等。 ,e_(j_e)I_(σ_e),并且必须对序列集执行模式匹配。在本文中,我们通过利用时间和内存高效的数据结构来介绍时间和内存高效的数据结构,我们的解决方案导致查询时间为O(m + vk log e +((mocc_v v)/ w +(PSC(p)m)/ w) O(N log N + vk log vk)的内存使用情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号