...
首页> 外文期刊>BioData Mining >An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies
【24h】

An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies

机译:一种扩展的数据挖掘方法,用于识别功能基因组研究中差异表达的测定特异性特征

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. Results This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. Conclusions PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.
机译:相比之下,背景微阵列数据集提供了数千个基因的相对表达水平,而少数情况则称为测定法,它们的数量相对较少。数据挖掘技术用于提取与检测有关的基因的特定信息。已证明,主成分分析(PCA)的多元统计技术可用于提供有效的数据挖掘方法。本文扩展了Rollins等人的PCA方法。微阵列数据集排名基因的发展,这些基因在两种生物学不同的检测组之间表达最不同。该方法在真实和模拟数据上进行评估,并根据错误发现率(FDR)和统计能力(SP)来与当前方法进行比较,这是正确识别重要基因的能力。结果这项工作开发并评估了两个基于PCA的新测试统计数据,并将它们与不基于PCA的流行方法进行了比较。在三个案例研究中评估发现,两种测试统计数据都是有效的:(i)将大肠杆菌细胞暴露于两种不同的乙醇水平; (ii)将肌生长抑制素应用于两组小鼠; (iii)从(ii)的性质得出的模拟数据研究。在与当前方法(CM)进行比较的基础上,提出的方法(PM)有效地鉴定了这些研究中的关键基因。对于基因方差恒定时建议的检验统计数据和当基因方差非恒定时检验统计数据之一,仿真研究均支持PM优于CM的PM识别精度。结论在较低的FDR和较高的SP方面,PM与CM相比非常有利。因此,PM可以非常有效地从大型微阵列数据集中产生准确的特征,以在PCA程序的初步步骤中鉴定的化验组之间进行差异表达,因此,建议在这些应用中使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号