...
首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >A Model-Based Approach to Gene Clustering with Missing Observation Reconstruction in a Markov Random Field Framework
【24h】

A Model-Based Approach to Gene Clustering with Missing Observation Reconstruction in a Markov Random Field Framework

机译:Markov随机场框架中基于模型的缺少观察重构的基因聚类方法。

获取原文
获取原文并翻译 | 示例

摘要

The different measurement techniques that interrogate biological systems provide means for monitoring the behavior of virtually all cell components at different scales and from complementary angles. However, data generated in these experiments are difficult to interpret. A first difficulty arises from high-dimensionality and inherent noise of such data. Organizing them into meaningful groups is then highly desirable to improve our knowledge of biological mechanisms. A more accurate picture can be obtained when accounting for dependencies between components (e.g., genes) under study. A second difficulty arises from the fact that biological experiments often produce missing values. When it is not ignored, the latter issue has been solved by imputing the expression matrix prior to applying traditional analysis methods. Although helpful, this practice can lead to unsound results. We propose in this paper a statistical methodology that integrates individual dependencies in a missing data framework. More explicitly, we present a clustering algorithm dealing with incomplete data in a Hidden Markov Random Field context. This tackles the missing value issue in a probabilistic framework and still allows us to reconstruct missing observations a posteriori without imposing any pre-processing of the data. Experiments on synthetic data validate the gain in using our method, and analysis of real biological data shows its potential to extract biological knowledge.
机译:询问生物系统的不同测量技术提供了一种手段,可以以不同的比例和互补的角度监测几乎所有细胞成分的行为。但是,这些实验中生成的数据难以解释。第一个困难来自这种数据的高维度和固有噪声。因此,非常需要将它们组织成有意义的组,以提高我们对生物学机制的认识。当考虑所研究的成分(例如基因)之间的依赖性时,可以获得更准确的图像。第二个困难来自生物学实验经常产生缺失值的事实。如果不忽略它,可以通过在应用传统分析方法之前估算表达式矩阵来解决后一个问题。尽管有帮助,但这种做法可能会导致不良结果。我们在本文中提出了一种统计方法,该方法将单个依赖项整合到丢失的数据框架中。更明确地说,我们提出了一种在隐马尔可夫随机场上下文中处理不完整数据的聚类算法。这解决了概率框架中的缺失值问题,并且仍然允许我们在不施加任何数据预处理的情况下重建后验缺失的观测值。合成数据的实验验证了使用我们的方法的好处,对真实生物数据的分析显示了其提取生物知识的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号