首页> 外文学位 >Clustering raw distributions of intensities from Affymetrix gene expression microarrays in order to evaluate statistical preprocessing methods.
【24h】

Clustering raw distributions of intensities from Affymetrix gene expression microarrays in order to evaluate statistical preprocessing methods.

机译:从Affymetrix基因表达微阵列中收集强度的原始分布,以评估统计预处理方法。

获取原文
获取原文并翻译 | 示例

摘要

Gene Expression microarrays have been used for approximately ten years to elucidate the genetic mechanisms behind common disease and biological processes. There is a large volume of research on the analysis of such data. Typically, the data must be preprocessed prior to statistical testing due to the prevalence of non-biological noise. This research focuses on clustering raw distributions of intensities from Affymetrix gene expression microarrays in order to determine properties of the data that affect performance of statistical processing methods. The novelty of this research lies in that, firstly, information is gathered on the distribution of the intensities of a variety of Affymetrix gene expression microarray data sets. A new clustering method, based on a visual definition of shape of the distributions is developed to cluster the array experiments based on their statistical distributions. Next, various preprocessing pipelines are applied to each data set within each distribution cluster to determine whether there is a one-to-one correspondence between distribution cluster and performance of the various pipelines. The area under receiver operating characteristic curve (AUC) is applied for determination of performance for spike-in data sets. Significant Gene Ontology (GO) and Intra-class Correlation Coefficient (ICC) are used to evaluate performance for real gene expression data sets. Subsequently, we study whether there is a best preprocessing pipeline that can be generally applied to all data sets assigned to a certain class that generates the most biologically meaningful result. v
机译:基因表达微阵列已经被使用了大约十年来阐明常见疾病和生物学过程背后的遗传机制。关于此类数据的分析有大量研究。通常,由于非生物噪声的普遍性,必须在统计测试之前对数据进行预处理。这项研究的重点是将Affymetrix基因表达微阵列中强度的原始分布聚类,以确定影响统计处理方法性能的数据属性。这项研究的新颖性在于,首先,收集有关各种Affymetrix基因表达微阵列数据集强度分布的信息。开发了一种基于分布形状的可视化定义的新聚类方法,以基于阵列实验的统计分布来聚类。接下来,将各种预处理管道应用于每个分布集群中的每个数据集,以确定分布集群与各种管道的性能之间是否存在一一对应的关系。接收器工作特性曲线(AUC)下的区域用于确定尖峰数据集的性能。重要基因本体论(GO)和类内相关系数(ICC)用于评估实际基因表达数据集的性能。随后,我们研究是否存在最佳的预处理管线,该管线通常可以应用于分配给特定类别的所有数据集,这些数据集会产生最有意义的生物学结果。 v

著录项

  • 作者

    Zou, Kun.;

  • 作者单位

    Southern Methodist University.;

  • 授予单位 Southern Methodist University.;
  • 学科 Biology Biostatistics.;Statistics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 199 p.
  • 总页数 199
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号