首页> 外文学位 >Genomic data analysis and processing with signal processing techniques.
【24h】

Genomic data analysis and processing with signal processing techniques.

机译:基因组数据分析和信号处理技术处理。

获取原文
获取原文并翻译 | 示例

摘要

Bioinformatics is an emerging multi-disciplinary field. In this research, we study two problems originating from biological applications using signal processing and statistical patter recognition techniques: (1) the genomic sequence alignment problem, and (2) the data integrity problem on Single Nucleotide Polymorphisms (SNP) data sets.; First, we introduced a gap sequence matching technique, which facilitates the match of genomic sequences. We studied the behavior of these gap sequences and proposed two methods, histogram-aided alignment (HAA) and matched filter alignment (MFA), to perform the alignment for pairwise and multiple genomic sequences. The main contribution here is that, even with partial knowledge of a genomic sequence (namely, the gap structure), we are able to accurately predict the remaining portions of the sequence using arguments from information theory. We proposed a fast gap sequence alignment system with suffix array implementation (GSA-SA). This system outperforms the current BLAST system (build 2.2.13) in terms of time and accuracy. Since BLAST is the most widely used system for sequence alignment, we expect GSA-SA to facilitate the sequence alignment technique in the near future.; Next, the problem of haplotype block partitioning and missing SNP inference was studied. We proposed to measure the haplotype diversity inside a block using the entropy. Based on this measure, we developed a new algorithm, called IPI (iterative partitioning-inference). The IPI algorithm consists of two steps. In the first step, a dynamic programming algorithm is adopted to partition haplotype data into blocks to minimize the total block entropy. In the second step, an EM-like algorithm is used to infer missing SNPs in each haplotype block to minimize the local block diversity. The IPI algorithm iterates these two steps until the total block entropy is minimized. It was shown by experimental results that the global IPI approach significantly improves the accuracy of the inference. Then, we considered the block-free framework that can accommodate larger data sets for missing SNP inference, without partitioning the haplotype block. The block-free inference system can be extended to haplotype inference and missing genotype inference. Our developed systems can infer all kinds of uncertain data from available data sets.
机译:生物信息学是一个新兴的多学科领域。在这项研究中,我们研究了使用信号处理和统计模式识别技术源自生物学应用的两个问题:(1)基因组序列比对问题,(2)单核苷酸多态性(SNP)数据集的数据完整性问题。首先,我们引入了缺口序列匹配技术,该技术促进了基因组序列的匹配。我们研究了这些缺口序列的行为,并提出了两种方法,直方图辅助比对(HAA)和匹配过滤器比对(MFA),以进行成对和多个基因组序列的比对。此处的主要贡献在于,即使对基因组序列(即缺口结构)有部分了解,我们仍然能够使用信息论中的论点准确预测序列的其余部分。我们提出了一种带有后缀数组实现的快速缺口序列比对系统(GSA-SA)。该系统在时间和准确性方面均优于当前的BLAST系统(内部版本2.2.13)。由于BLAST是最广泛使用的序列比对系统,因此我们希望GSA-SA在不久的将来能够促进序列比对技术的发展。接下来,研究了单倍型模块划分和缺少SNP推断的问题。我们建议使用熵来衡量一个区块内的单倍型多样性。基于此措施,我们开发了一种称为IPI(迭代分区推理)的新算法。 IPI算法包括两个步骤。第一步,采用动态编程算法将单倍型数据划分为多个块,以使总块熵最小。第二步,使用类似EM的算法来推断每个单倍型模块中缺失的SNP,以使局部模块多样性最小化。 IPI算法重复执行这两个步骤,直到总块熵最小。实验结果表明,全局IPI方法可以显着提高推理的准确性。然后,我们考虑了无块的框架,该框架可以容纳更大的数据集以进行丢失的SNP推断,而无需对单元型块进行分区。无块推断系统可以扩展为单倍型推断和缺失基因型推断。我们开发的系统可以从可用数据集中推断出各种不确定的数据。

著录项

  • 作者

    Su, Shih-Chieh.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Biology Biostatistics.; Biology Genetics.; Engineering Electronics and Electrical.; Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 191 p.
  • 总页数 191
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物数学方法;遗传学;无线电电子学、电信技术;
  • 关键词

  • 入库时间 2022-08-17 11:40:54

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号