...
首页> 外文期刊>Mathematical Biosciences: An International Journal >From genome-scale data to models of infectious disease: A Bayesian network-based strategy to drive model development
【24h】

From genome-scale data to models of infectious disease: A Bayesian network-based strategy to drive model development

机译:从基因组规模的数据到传染病模型:基于贝叶斯网络的驱动模型开发的策略

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

High-throughput, genome-scale data present a unique opportunity to link host to pathogen on a molecular level. Forging such connections will help drive the development of mathematical models to better understand and predict both pathogen behavior and the epidemiology of infectious diseases, including malaria. However, the datasets that can aid in identifying these links and models are vast and not amenable to simple, reductionist, and univariate analyses. These datasets require data mining in order to identify the truly important measurements that best describe clinical and molecular observations. Moreover, these datasets typically have relatively few samples due to experimental limitations (particularly for human studies or in vivo animal experiments), making data mining extremely difficult. Here, after first providing a brief overview of common strategies for data reduction and identification of relationships between variables for inclusion in mathematical models, we present a new generalized strategy for performing these data reduction and relationship inference tasks. Our approach emphasizes the importance of robustness when using data to drive model development, particularly when using genome-scale, small-sample in vivo data. We identify the use of appropriate feature reduction combined with data permutations and subsampling strategies as being critical to enable increasingly robust results from network inference using high-dimensional, low-observation data. (C) 2015 Elsevier Inc. All rights reserved.
机译:高通量,基因组规模的数据提供了在分子水平上将宿主与病原体联系起来的独特机会。建立这种联系将有助于推动数学模型的发展,从而更好地理解和预测病原体行为以及包括疟疾在内的传染病的流行病学。但是,可以帮助识别这些链接和模型的数据集非常庞大,不适合进行简单,简化和单变量分析。这些数据集需要进行数据挖掘,以便确定最能描述临床和分子观察结果的真正重要的测量值。此外,由于实验的局限性(尤其是用于人体研究或体内动物实验),这些数据集通常具有相对较少的样本,这使得数据挖掘极为困难。在这里,首先简要概述了数据缩减的通用策略并确定了要纳入数学模型的变量之间的关系,然后,我们提出了一种新的通用策略来执行这些数据缩减和关系推断任务。我们的方法强调了在使用数据驱动模型开发时,尤其是在使用基因组规模的小样本体内数据时,鲁棒性的重要性。我们认为使用适当的特征约简,数据置换和二次采样策略是至关重要的,这对于从使用高维,低观测数据的网络推断中获得越来越可靠的结果至关重要。 (C)2015 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号