首页> 外文期刊>Computers & operations research >Logic based methods for SNPs tagging and reconstruction
【24h】

Logic based methods for SNPs tagging and reconstruction

机译:基于逻辑的SNP标记和重构方法

获取原文
获取原文并翻译 | 示例
           

摘要

SNPs are positions of the DNA sequences where the differences among individuals are embedded. The knowledge of such SNPs is crucial for disease association studies, but even if the number of such positions is low (about 1% of the entire sequence), the cost to extract the complete information is actually very high. Recent studies have shown that DNA sequences are structured into blocks of positions, that are conserved during evolution, where there is strong correlation among values (alleles) of different loci. To reduce the cost of extracting SNPs information, the block structure of the DNA has suggested to limit the process to a subset of SNPs, the so-called Tag SNPs, that are able to maintain the most of the information contained in the whole sequence. In this paper, we apply a technique for feature selection based on integer programming to the problem of Tag SNP selection. Moreover, to test the quality of our approach, we consider also the problem of SNPs reconstruction, i.e. the problem of deriving unknown SNPs from the value of Tag SNPs and propose two reconstruction methods, one based on a majority vote and the other on a machine learning approach. We test our algorithm on two public data sets of different nature, providing results that are, when comparable, in line with the related literature. One of the interesting aspects of the proposed method is to be found in its capability to deal simultaneously with very large SNPs sets, and, in addition, to provide highly informative reconstruction rules in the form of logic formulas.
机译:SNP是DNA序列的位置,其中嵌入了个体之间的差异。对此类SNP的了解对于疾病关联研究至关重要,但是即使此类位置的数量很少(约占整个序列的1%),提取完整信息的成本实际上也非常高。最近的研究表明,DNA序列被结构化成位置块,这些位置在进化过程中是保守的,其中不同基因座的值(等位基因)之间具有很强的相关性。为了降低提取SNP信息的成本,DNA的嵌段结构建议将过程限制为SNP的一个子集,即所谓的Tag SNP,它们能够保留整个序列中包含的大部分信息。在本文中,我们将基于整数编程的特征选择技术应用于标签SNP选择问题。此外,为了测试我们方法的质量,我们还考虑了SNP重构的问题,即从Tag SNP的值推导未知SNP的问题,并提出了两种重构方法,一种基于多数表决,另一种基于机器学习方法。我们在两个性质不同的公共数据集上测试了我们的算法,提供的结果与相关文献相符。可以发现该方法有趣的方面之一,是它可以同时处理非常大的SNP集,并且可以提供逻辑公式形式的信息量很大的重建规则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号