Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

Randall Hulshizer; Eric M Blalock

首页> 外文期刊>BMC Bioinformatics >Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

【24h】

Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

机译：事后模式匹配：在单通道微阵列数据中为统计学定义的表达模式分配重要性

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, post hoc tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has a priori information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes. Results We developed a four step, post hoc pattern matching (PPM) algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA), coupled with post hoc 'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency). Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies). We wrote a Visual Basic program (StatiGen) that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of the genes comprising each pattern. The visual basic code, installation files for StatiGen, and sample data are available as supplementary material. Conclusion The PPM procedure is designed to augment current microarray analysis procedures by allowing researchers to incorporate all of the information from post hoc tests to establish unique, overarching gene expression patterns in which there is no overlap in gene membership. In our hands, PPM works well for studies using from three to six treatment groups in which the researcher is interested in treatment-related patterns of gene expression. Hardware/software limitations and extreme number of theoretical expression patterns limit utility for larger numbers of treatment groups. Applied to a published microarray experiment, the StatiGen program successfully flagged patterns that had been manually assigned in prior work, and further identified other gene expression patterns that may be of interest. Thus, over a moderate range of treatment groups, PPM appears to work well. It allows researchers to assign statistical probabilities to patterns of gene expression that fit a priori expectations/hypotheses, it preserves the data's ability to show the researcher interesting, yet unanticipated gene expression patterns, and assigns the majority of ANOVA-significant genes to non-overlapping patterns.

机译：背景技术研究人员在具有两个以上治疗组的实验设计中使用RNA表达微阵列，经常使用ANOVA方法鉴定具有统计学意义的基因。但是，ANOVA测试不能区分多个治疗组中的哪一个彼此不同。因此，使用事后检验，例如线性对比，模板相关和成对比较。线性对比和模板相关性非常有效，尤其是当研究人员拥有先验信息时，这些信息指向不同治疗组之间的特定模式/模板。此外，所有成对比较都可用于鉴定基因表达的特定，治疗组依赖性模式。但是，这些方法因研究人员的假设而有偏差，并且使用这些方法可能无法检测到某些基于治疗的模式。最后，不同的模式可能偶然发生的概率也不同，这对研究人员对模式及其组成基因的结论具有重要影响。结果我们开发了一种四步事后模式匹配（PPM）算法，可自动完成单通道基因表达模式的识别/重要性。首先，针对所有基因计算一维方差分析（ANOVA），再进行事后“所有成对”比较。其次，对于每个ANOVA重要基因，所有成对的对比结果均被编码以创建唯一的模式ID号。在数据的每个模式中发现的＃个基因被识别为该模式的“实际”频率。第三，使用蒙特卡洛模拟，在随机数据（“随机”基因模式频率）中估计那些模式的频率。第四，计算出模式过度代表的Z得分（“实际”相对于“随机”基因模式频率）。我们编写了一个Visual Basic程序（StatiGen），该程序可以自动执行PPM程序，使用过分代表的模式的标准化图形以及包含每种模式的基因列表来构建Excel工作簿。可视基本代码，StatiGen的安装文件和示例数据可作为补充材料。结论PPM程序旨在通过允许研究人员整合事后测试中的所有信息，以建立独特的总体基因表达模式（其中基因成员之间没有重叠），来增强当前的微阵列分析程序。在我们手中，PPM非常适合使用三至六个治疗组进行的研究，研究人员对其中的治疗相关基因表达模式感兴趣。硬件/软件的局限性以及理论表达模式的极端数量限制了更多治疗组的实用性。应用到已发表的微阵列实验中，StatiGen程序成功标记了先前工作中手动分配的模式，并进一步确定了其他可能感兴趣的基因表达模式。因此，在中等范围的治疗组中，PPM似乎效果良好。它使研究人员能够为符合先验期望/假设的基因表达模式分配统计概率，它保留了数据的能力，以向研究人员显示有趣但尚未预料到的基因表达模式，并将大多数ANOVA重要基因分配给非重叠模式。

著录项

来源
《BMC Bioinformatics》 |2007年第1期|共页
作者
Randall Hulshizer; Eric M Blalock;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词
入库时间 2022-08-18 05:42:40

相似文献

外文文献
中文文献
专利

1. Microarray data uncover the genome-wide gene expression patterns in response to heat stress in rice post-meiosis panicle [J] . Zhang Xianwen, Xiong Hairong, Liu Ailing, Journal of Plant Biology . 2014,第6期

机译：基因芯片数据揭示了水稻减数分裂后穗响应热胁迫的全基因组基因表达模式
2. Patterning Multiplex Protein Microarrays in a Single Microfluidic Channel [J] . Tohid Fatanat Didar, Amir M. Foudeh, Maryam Tabrizian Analytical chemistry . 2012,第2期

机译：在单个微流控通道中模式化多重蛋白微阵列。
3. HCN4 ion channel function is required for early events that regulate anatomical left-right patterning in a nodal and lefty asymmetric gene expression-independent manner HCN4 ion channel function is required for early events that regulate anatomical left-right patterning in a nodal and lefty asymmetric gene expression-independent manner HCN4 ion channel function is required for early events that regulate anatomical left-right patterning in a nodal and lefty asymmetric gene expression-independent manner [J] . Kelly A. McLaughlin, Emily J. Pitcairn, Michael Levin, Biology Open . 2017,第10期

机译：HCN4离子通道功能对于以节点和左手不对称基因表达无关的方式调节解剖结构左右方向的早期事件是必需的HCN4离子通道功能对于以节点和左手不对称基因的表达方式调节解剖结构左右方向的早期事件是必需的表达独立方式HCN4离子通道功能对于早期事件是必需的，该事件以节点和左侧不对称基因表达独立方式调节解剖学左右模式
4. MSPattern: Efficient mining maximal subspace differential co-expression patterns in microarray datasets [C] . Wang Miao, Shang Xuequn, Miao Miao, 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) . 2011

机译：MSPattern：在微阵列数据集中高效挖掘最大子空间差异共表达模式
5. Pattern Matching Statistics in the Permtutations Sn and the Alternating Permutations An for Minimally Overlapping Patterns [D] . Duane, Adrian Scott 2013

机译：最小重叠图案中置换Sn和交替置换An中的模式匹配统计量
6. Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data [O] . Randall Hulshizer, Eric M Blalock 2007

机译：事后模式匹配：在单通道微阵列数据中为统计定义的表达模式分配重要性
7. Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data [O] . Hulshizer, Randall, Blalock, Eric M 2007

机译：事后模式匹配：在单通道微阵列数据中为统计定义的表达模式分配重要性

Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

摘要

著录项

相似文献

相关主题

期刊订阅