首页> 外文会议>Evolutionary computation, machine learning and data mining in bioinformatics >A Model Free Method to Generate Human Genetics Datasets with Complex Gene-Disease Relationships
【24h】

A Model Free Method to Generate Human Genetics Datasets with Complex Gene-Disease Relationships

机译:一种免费模型生成具有复杂基因-疾病关系的人类遗传学数据集的方法

获取原文
获取原文并翻译 | 示例

摘要

A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variations and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate six-hundred pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variations have been minimized, while the predictiveness of third, fourth, or fifth order combinations is maximized. This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This could improve our ability to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 56,600 datasets are available from.
机译:人类遗传学的目标是发现影响个人对常见疾病的易感性的遗传因素。人们认为最常见的疾病是由两个或多个相互作用的组件的联合失效而不是单个组件的失效引起的。这使选择信息性遗传变异的任务和为它们之间的相互作用建模的任务都大大复杂化了。我们和其他人先前已经开发出算法来检测和建模这些遗传因素与疾病之间的关系。以前,这些方法已通过根据预定义的遗传模型模拟的数据集进行了评估。在这里,我们开发和评估无模型进化策略,以生成显示个体基因型与疾病易感性之间复杂关系的数据集。我们表明,这种无模型的方法能够针对任意相互作用顺序和样本量生成具有不同基因-疾病关系的各种数据集阵列。我们专门生成了600个pareto前沿;我们算法的每次独立运行都需要一个。在每次运行中,单个遗传变异和成对遗传变异的预测性已降至最低,而三,四或五阶组合的预测性已最大化。这种方法和所得的数据集将允许在没有预先指定的遗传模型的情况下测试新方法的功能。这可以提高我们评估哪些方法可以成功解决模型未知的人类遗传学问题的能力。我们进一步向社区免费提供每次运行数据集的整个最佳状态,以便可以对新方法进行严格评估。这些56,600个数据集可从中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号