首页> 外文学位 >Information fusion of multiple genomic sensors for clustering and cis-regulatory element identification.
【24h】

Information fusion of multiple genomic sensors for clustering and cis-regulatory element identification.

机译:多个基因组传感器的信息融合,用于聚类和顺式调控元件识别。

获取原文
获取原文并翻译 | 示例

摘要

The use of computational techniques for analyzing genomic data has seen rapid growth in recent years, especially with the advent of high-throughput technologies and availability of genome-wide DNA sequences. Co-regulated genes are often involved in similar cellular and biological processes, controlled by common regulators. Genes with similar patterns of expression often exhibit similar regulatory behavior. Further, control sequences corresponding to common regulators may be identified within non-coding DNA sequences of co-expressed genes.;Clustering techniques may be used to identify cohorts of genes with similar expression patterns. The results of a clustering algorithm and the quality of the clusters are largely dependent on the choice of distance measure used to calculate similarity. A novel clustering algorithm, which uses Kullback-Leibler (KL) Divergence to estimate gene similarity, is presented. The KL Clustering algorithm has been applied successfully to Heart Rate Variability data. Due to systematic and experimental variations, gene expression measurements are often noisy. Individual expression profiles are modeled as Gaussian Radial Basis Functions (GRBF) to address this problem. A new approximation method to evaluate KL divergence for GRBFs is introduced. Microarray data alone are limited in their power to identify co-regulated genes. A Combined Clustering algorithm that is capable of incorporating diverse sources of information simultaneously is presented.;Transcriptional regulation is mediated by the interaction between transcription factors and their DNA binding sites, represented by short sequences usually present near the promoter regions of genes. Co-regulated genes often have one or more regulators in common. A search for common sequence patterns within DNA sequences of clustered genes can be used to identify transcription factor binding sites (regulatory elements or motifs). A novel method to identify regulatory elements that discriminate between prespecified gene clusters is presented. The algorithm, based on the Naïve Bayes technique, uses a string-based model to represent motifs. Since the motifs are discriminative, the need for background distributions is completely eliminated. The method is capable of integrating diverse data sources such as gene expression data, sequence data and phylogenetic information (e.g. sequence conservation across species). An evaluation of the identified motifs on mouse genes indicates that comparative genomics significantly improves the quality of the predictions. A new interactive motif visualization tool MotijTreeViz is presented.;Preliminary results on several real data sets indicate that this suite of algorithms produce results that are biologically significant. All the algorithms are designed to be scalable. The software is made publicly available via a web user interface at http://biogeowarehouse.cse.psu.edu. Additionally, the individual programs may be downloaded at http://www.cse.psu.edu/~jkasturi/Software.htm.
机译:近年来,随着高通量技术的出现和全基因组DNA序列的可用性,使用计算技术来分析基因组数据的发展迅速。共同调控的基因通常参与相似的细胞和生物学过程,并受到共同调控者的控制。具有相似表达模式的基因通常表现出相似的调控行为。此外,可以在共表达的基因的非编码DNA序列内鉴定与共同调节子相对应的调控序列。聚类技术可以用于鉴定具有相似表达模式的基因组。聚类算法的结果和聚类的质量在很大程度上取决于用于计算相似性的距离度量的选择。提出了一种新的聚类算法,该算法使用Kullback-Leibler(KL)散度来估计基因相似性。 KL聚类算法已成功应用于心率变异性数据。由于系统和实验上的变化,基因表达的测量常常很嘈杂。单个表达谱被建模为高斯径向基函数(GRBF)以解决此问题。引入了一种新的近似方法来评估GRBF的KL散度。仅微阵列数据鉴定共调控基因的能力受到限制。提出了一种能够同时整合各种信息源的组合聚类算法。转录调控是由转录因子与其DNA结合位点之间的相互作用所介导的,通常由基因启动子区域附近的短序列代表。共同调控的基因通常具有一个或多个共同的调控因子。在簇状基因的DNA序列内搜索共同序列模式可用于鉴定转录因子结合位点(调节元件或基序)。提出了一种新的方法来鉴定区分预定基因簇的调控元件。该算法基于朴素贝叶斯技术,使用基于字符串的模型来表示图案。由于图案是有区别的,因此完全不需要背景分布。该方法能够整合各种数据源,例如基因表达数据,序列数据和系统发育信息(例如跨物种的序列保守性)。对小鼠基因上鉴定出的基序的评估表明,比较基因组学可以显着提高预测的质量。提出了一种新的交互式主题可视化工具MotijTreeViz 。;在几个真实数据集上的初步结果表明,这套算法产生的结果具有生物学意义。所有算法均设计为可扩展的。该软件可通过Web用户界面(http://biogeowarehouse.cse.psu.edu)公开获得。另外,可以从http://www.cse.psu.edu/~jkasturi/Software.htm下载各个程序。

著录项

  • 作者

    Kasturi, Jyotsna.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 209 p.
  • 总页数 209
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号