首页> 外文会议>Asilomar Conference on Signals, Systems, and Computers >Detecting novel genomic structural variants through negative binomial optimization
【24h】

Detecting novel genomic structural variants through negative binomial optimization

机译:通过负二氯化优化检测新型基因组结构变体

获取原文

摘要

Structural variants (SVs) are short sequences of DNA, larger than one nucleotide, that can vary between members of the same species. Although SVs are relatively rare, compared to single nucleotide variants (SNVs) they are an important source of genetic variation and some SVs have been associated with diseases and susceptibility to certain types of cancer. SV detection is commonly performed by aligning sequenced fragments of an individual’s genome to a high-quality reference genome. Candidate SVs correspond to discordant mapped configurations of fragments; however, errors in the sequencing also lead to potential discordant mappings. Because of this error, many candidate SVs are in fact false positives. When sequencing coverage is high, SV detection is more accurate, but this comes at higher sequencing cost. Sequencing at low coverage does reduce cost, but increases error and complexity of SV detection. The goal of our work is to use mathematical optimization to improve SV detection in low-coverage DNA sequencing data. Previous studies of SV detection have modeled coverage with a Poisson distribution, but this assumes the mean and variance are the same. In an effort more closely model the experimental data we use the negative binomial distribution, which allows for the mean and variance to differ, and contains the Poisson distribution as a special case. Our approach also control false positive predictions by simultaneously considering simultaneous SV prediction in a parent and child. We assume that most SVs carried by a child are inherited from a parent but a small fraction may be novel to the child. We balance the rarity of novel versus inherited SVs by enforcing sparsity through an l1-penalty and compare this negative binomial reconstruction algorithm to the Poisson reconstruction algorithm by testing both on the same simulated data sets.
机译:结构变体(SV)是DNA的短序列,大于一个核苷酸,其可以在相同物种的成员之间变化。虽然SVS比单一核苷酸变体(SNV)相对罕见,但它们是遗传变异的重要来源,但是一些SVS与某些类型的癌症的疾病和易感性有关。通过将个体基因组的测序片段对准至高质量参考基因组来常见地进行SV检测。候选SVS对应于碎片的不和谐映射配置;但是,测序中的错误也导致潜在的不和谐映射。由于此错误,许多候选SV都实际上是误报。当测序覆盖率很高时,SV检测更准确,但这是较高的测序成本。在低覆盖范围内测序确实降低了成本,但增加了SV检测的误差和复杂性。我们作品的目标是使用数学优化来改善低覆盖DNA测序数据中的SV检测。以前的SV检测研究具有带有泊松分布的覆盖率,但这假设平均值和方差是相同的。在努力更加密切地模范实验数据,我们使用负二项式分布,这允许均值和方差不同,并包含泊松分布作为特殊情况。我们的方法还通过同时考虑父母和孩子的同时SV预测来控制假阳性预测。我们假设孩子携带的大多数svs都是从父母继承的,但小部分可能是小说的。我们通过L1惩罚通过L1-罚金实施稀疏性,通过L1罚分,通过在同一模拟数据集上进行测试来平衡新颖的遗传性SVS的稀有性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号