首页> 外文期刊>Signal Processing, IEEE Transactions on >Maximum-Parsimony Haplotype Inference Based on Sparse Representations of Genotypes
【24h】

Maximum-Parsimony Haplotype Inference Based on Sparse Representations of Genotypes

机译:基于基因型的稀疏表示的最大简约单倍型推断

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The haplotypes of an individual can be used to predict diseases and help designing drugs. However, experimentally determining haplotypes is expensive and time-consuming, so genotypes are usually measured instead. Given the set of genotypes for a group of unrelated individuals, it is possible to infer the haplotype pair for each subject based on the maximum parsimony principle. Finding the exact solution to this problem is NP-hard. We propose two related formulations of the haplotype inference problem that translate the maximum parsimony principle into the sparse representation of genotypes. In the first formulation we look for the set of haplotypes that explain the genotypes such that the resulting frequency vector of haplotypes is as sparse as possible. The sparseness condition is achieved by minimizing the Tsallis entropy of the frequency vector, which is still an NP-hard problem. We propose a method that enumerates all local minima with high probability by solving a set of integer linear programs of low dimensionality. The minimizer is then found by identifying the local minimum point that achieves the lowest Tsallis entropy. In the second formulation, we state the haplotypes inference as a sparse dictionary selection problem. Each genotype is reconstructed by a haplotype pair selected from a set of available haplotypes that needs to be sparse. This leads to an approximately submodular maximization problem and therefore, can be solved with a fast greedy method. We test the proposed solutions with different data sets and compare the performance with the state-of-the-art methods, achieving similar or better results.
机译:一个人的单倍型可以用来预测疾病和帮助设计药物。但是,通过实验确定单倍型是昂贵且费时的,因此通常要测量基因型。给定一组不相关的个​​体的基因型集,就有可能基于最大简约原则推断每个受试者的单倍型。找到这个问题的确切解决方案是NP难的。我们提出了单倍型推断问题的两个相关表述,它们将最大简约原则转化为基因型的稀疏表示。在第一个公式中,我们寻找能解释基因型的单倍型集合,以使所得的单倍型频率向量尽可能稀疏。稀疏条件是通过最小化频率向量的Tsallis熵来实现的,这仍然是一个NP难题。我们提出了一种方法,该方法通过求解一组低维整数线性程序来以高概率枚举所有局部极小值。然后,通过识别实现最低Tsallis熵的局部最小点,找到最小化器。在第二种表述中,我们将单倍型推理陈述为稀疏词典选择问题。每个基因型都是通过从一组需要稀疏的可用单体型中选择的单体型对来重建的。这导致了近似次模最大化的问题,因此可以使用快速贪婪方法解决。我们使用不同的数据集测试提出的解决方案,并将其性能与最新方法进行比较,从而获得相似或更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号