...
首页> 外文期刊>Progress in brain research >Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.
【24h】

Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.

机译:临床神经科学中的功能基因组学和蛋白质组学:数据挖掘和生物信息学。

获取原文
获取原文并翻译 | 示例

摘要

The goal of this chapter is to introduce some of the available computational methods for expression analysis. Genomic and proteomic experimental techniques are briefly discussed to help the reader understand these methods and results better in context with the biological significance. Furthermore, a case study is presented that will illustrate the use of these analytical methods to extract significant biomarkers from high-throughput microarray data. Genomic and proteomic data analysis is essential for understanding the underlying factors that are involved in human disease. Currently, such experimental data are generally obtained by high-throughput microarray or mass spectrometry technologies among others. The sheer amount of raw data obtained using these methods warrants specialized computational methods for data analysis. Biomarker discovery for neurological diagnosis and prognosis is one such example. By extracting significant genomic and proteomic biomarkers in controlled experiments, we come closerto understanding how biological mechanisms contribute to neural degenerative diseases such as Alzheimers' and how drug treatments interact with the nervous system. In the biomarker discovery process, there are several computational methods that must be carefully considered to accurately analyze genomic or proteomic data. These methods include quality control, clustering, classification, feature ranking, and validation. Data quality control and normalization methods reduce technical variability and ensure that discovered biomarkers are statistically significant. Preprocessing steps must be carefully selected since they may adversely affect the results of the following expression analysis steps, which generally fall into two categories: unsupervised and supervised. Unsupervised or clustering methods can be used to group similar genomic or proteomic profiles and therefore can elucidate relationships within sample groups. These methods can also assign biomarkers to sub-groups based on their expression profiles across patient samples. Although clustering is useful for exploratory analysis, it is limited due to its inability to incorporate expert knowledge. On the other hand, classification and feature ranking are supervised, knowledge-based machine learning methods that estimate the distribution of biological expression data and, in doing so, can extract important information about these experiments. Classification is closely coupled with feature ranking, which is essentially a data reduction method that uses classification error estimation or other statistical tests to score features. Biomarkers can subsequently be extracted by eliminating insignificantly ranked features. These analytical methods may be equally applied to genetic and proteomic data. However, because of both biological differences between the data sources and technical differences between the experimental methods used to obtain these data, it is important to have a firm understanding of the data sources and experimental methods. At the same time, regardless of the data quality, it is inevitable that some discovered biomarkers are false positives. Thus, it is important to validate discovered biomarkers. The validation process may be slow; yet, the overall biomarker discovery process is significantly accelerated due to initial feature ranking and data reduction steps. Information obtained from the validation process may also be used to refine data analysis procedures for future iteration. Biomarker validation may be performed in a number of ways - bench-side in traditional labs, web-based electronic resources such as gene ontology and literature databases, and clinical trials.
机译:本章的目的是介绍一些可用的表达分析计算方法。简要讨论了基因组和蛋白质组实验技术,以帮助读者理解这些方法,并在具有生物学意义的背景下取得更好的结果。此外,提出了一个案例研究,该案例研究将说明如何使用这些分析方法从高通量微阵列数据中提取重要的生物标志物。基因组和蛋白质组学数据分析对于理解人类疾病所涉及的潜在因素至关重要。当前,这些实验数据通常通过高通量微阵列或质谱技术等获得。使用这些方法获得的大量原始数据保证了专门的计算方法可用于数据分析。用于神经学诊断和预后的生物标志物发现就是这样的例子之一。通过在对照实验中提取重要的基因组和蛋白质组生物标志物,我们可以更深入地了解生物学机制如何导致神经退行性疾病(例如阿尔茨海默氏病)以及药物治疗如何与神经系统相互作用。在生物标记物发现过程中,必须仔细考虑几种计算方法才能准确地分析基因组或蛋白质组数据。这些方法包括质量控制,聚类,分类,特征排名和验证。数据质量控制和标准化方法减少了技术差异,并确保发现的生物标记具有统计学意义。必须仔细选择预处理步骤,因为它们可能会对以下表达分析步骤的结果产生不利影响,这些表达分析步骤通常分为两类:无监督和有监督。无监督或聚类方法可用于对相似的基因组或蛋白质组图谱进行分组,因此可阐明样品组内的关系。这些方法还可基于生物标记物在患者样品中的表达谱将其分配给亚组。尽管聚类对于探索性分析很有用,但由于无法吸收专家知识而受到限制。另一方面,对分类和特征排序进行监督,这些方法是基于知识的机器学习方法,该方法可以估计生物表达数据的分布,从而可以提取有关这些实验的重要信息。分类与特征排名紧密相关,特征排名本质上是一种数据归约方法,使用分类误差估计或其他统计测试对特征进行评分。随后可以通过消除无关紧要的特征来提取生物标记。这些分析方法可以同等地应用于遗传和蛋白质组数据。但是,由于数据源之间的生物学差异和用于获取这些数据的实验方法之间的技术差异,因此对数据源和实验方法有深刻的了解非常重要。同时,无论数据质量如何,不可避免地会发现一些生物标志物是假阳性。因此,验证发现的生物标记很重要。验证过程可能很慢;然而,由于初始特征排名和数据缩减步骤,整个生物标记物发现过程得到了显着加速。从验证过程中获得的信息也可以用于完善数据分析过程,以用于将来的迭代。生物标志物的验证可以通过多种方式进行-传统实验室中的实验台,基于网络的电子资源(如基因本体论和文献数据库)以及临床试验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号