【24h】

DNA Sequence Similarity Search through Content-Based Retrieval Technique

机译:基于内容的检索技术进行DNA序列相似性搜索

获取原文
获取原文并翻译 | 示例

摘要

Deoxyribonucleic acid (DNA) sequences are difficult to analyze similarity due to their length and complexity. The challenge lies in being able to use digital signal processing (DSP) to solve highly relevant problems in DNA sequences. Here, we transfer a one-dimensional (1D) DNA sequence into a two-dimensional (2D) pattern by using the Peano scan algorithm. Four complex values are assigned to the characters "A", "C", "T", and "G", respectively. Then, Fourier transform is employed to obtain far-field amplitude distribution of the 2D pattern. Hereto, a 1D DNA sequence becomes a 2D image pattern. Features are extracted from the 2D image pattern with the Principle Component Analysis (PCA) method. Therefore, the DNA sequence database can be established. Unfortunately, comparing features may take a long time when the database is large since multi-dimensional features are often available. This problem is solved by building indexing structure like a filter to filter-out non-relevant items and select a subset of candidate DNA sequences. Clustering algorithms can organize the multi-dimensional feature data into the indexing structure for effective retrieval. Accordingly, the query sequence can be only compared against candidate ones rather than all sequences in database. In fact, our algorithm provides a pre-processing method to accelerate the DNA sequence search process. Finally, experimental results further demonstrate the efficiency of our proposed algorithm for DNA sequences similarity retrieval.
机译:脱氧核糖核酸(DNA)序列由于其长度和复杂性而难以分析相似性。挑战在于能否使用数字信号处理(DSP)解决DNA序列中高度相关的问题。在这里,我们通过使用Peano扫描算法将一维(1D)DNA序列转换为二维(2D)模式。四个复数值分别分配给字符“ A”,“ C”,“ T”和“ G”。然后,采用傅立叶变换获得二维图案的远场幅度分布。至此,一维DNA序列变为二维图像图案。使用主成分分析(PCA)方法从2D图像图案中提取特征。因此,可以建立DNA序列数据库。不幸的是,数据库很大时,比较特征可能会花费很长时间,因为多维特征经常可用。通过建立索引结构(例如过滤器)以过滤掉无关项目并选择候选DNA序列的子集,可以解决此问题。聚类算法可以将多维特征数据组织到索引结构中,以进行有效检索。因此,查询序列只能与候选序列进行比较,而不能与数据库中的所有序列进行比较。实际上,我们的算法提供了一种预处理方法来加速DNA序列搜索过程。最后,实验结果进一步证明了我们提出的DNA序列相似性检索算法的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号