Approximate string matching in DNA sequences

机译：DNA序列中匹配的近似字符串

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Approximate string matching on large DNA sequences data is very important in bioinformatics. Some studies have shown that suffix tree is an efficient data structure for approximate string matching. It performs better than suffix array if the data structure can be stored entirely in the memory. However; our study find that suffix array is much better than suffix tree for indexing the DNA sequences since the data structure has to be created and stored on the disk due to its size. We propose a novel auxiliary data structure which greatly improves the efficiency of suffix array in the approximate string matching problem in the external memory model. The second problem we have tackled is the parallel approximate matching in DNA sequence. We propose 2 novel parallel algorithms for this problem and implement them on a PC cluster. The result shows that when the error allowed is small, a direct partitioning of the array over the machines in the cluster is a more efficient approach. On the other hand, when the error allowed is large, partitioning the data over the machines is a better approach.

机译：大DNA序列数据上的近似字符串在生物信息学中非常重要。一些研究表明，后缀树是用于近似字符串匹配的有效数据结构。如果数据结构可以完全存储在存储器中，它会比后缀数组更好。然而;我们的研究发现后缀阵列比后缀树更好，用于索引DNA序列，因为必须创建数据结构并由于其尺寸而存储在磁盘上。我们提出了一种新颖的辅助数据结构，它大大提高了外部存储器模型中近似字符串匹配问题的后缀阵列的效率。我们解决的第二个问题是DNA序列中的平行近似匹配。我们为此问题提出了2个新的并行算法，并在PC集群上实现它们。结果表明，当允许的错误很小时，在群集中的机器上直接分区阵列是一种更有效的方法。另一方面，当允许的错误很大时，通过机器上的数据进行分区是更好的方法。

著录项

来源
《International Conference on Database Systems for Advanced Applications》|2003年||共8页
会议地点
作者
Lok-Lam Cheng; David W. Cheung; Siu-Ming Yiu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-53;
关键词

相似文献

外文文献
中文文献
专利

1. New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [J] . Ho ThienLuan, Oh Seung-Rohk, Kim HyunJin Journal of supercomputing . 2018,第5期

机译：海明距离下定长近似字符串匹配和近似圆字符串匹配的新算法
2. Correction to: New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [J] . Ho ThienLuan, Oh Seung-Rohk, Kim HyunJin Journal of supercomputing . 2018,第5期

机译：更正为：在汉明距离下用于定长近似字符串匹配和近似圆形字符串匹配的新算法
3. Optimal implementations of the approximate string matching and the approximate discrete signal matching on the memory machine models [J] . Koji Nakano Parallel Algorithms and Applications . 2014,第1a2期

机译：内存机器模型上近似字符串匹配和近似离散信号匹配的最佳实现
4. Approximate string matching in DNA sequences [C] . Lok-Lam Cheng, Cheung, D.W., . 2003

机译：DNA序列中的近似字符串匹配
5. Discovering motifs in DNA and protein sequences: The approximate common substring problem. [D] . Bailey, Timothy Lawrence. 1995

机译：在DNA和蛋白质序列中发现基序：近似的常见子串问题。
6. libFLASM: a software library for fixed-length approximate string matching [O] . Lorraine A. K. Ayad, Solon P. Pissis, Ahmad Retha 2016

机译：libFLASM：用于固定长度的近似字符串匹配的软件库
7. Approximate String Matching for Searching DNA Sequences [O] . Jolanta Kawulok 2013

机译：搜索DNA序列的近似字符串匹配

Approximate string matching in DNA sequences

摘要

著录项

相似文献

相关主题

期刊订阅