首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Reconstruction of high read-depth signals from low-depth whole genome sequencing data using deep learning
【24h】

Reconstruction of high read-depth signals from low-depth whole genome sequencing data using deep learning

机译:使用深度学习从低深度的全基因组测序数据重建高深度的信号

获取原文

摘要

Motivation: Next-generation sequencing (NGS) technologies using DNA, RNA, or methylation sequencing are prevailing tools used in modern genome research. For DNA sequencing, whole genome sequencing (WGS) and whole exome sequencing (WES) are two typical applications with a different preference on the trade-off between sequencing depth and base coverage. Although sequencing costs have been greatly reduced, the sequence depth used in WGS is relatively lower than WES (e.g., ~35× vs. 100×~). In addition, biases and batch effects may exist in different stages of a NGS experiment. Using low-depth and biased WGS data for downstream analyses is more sensitive to the bias problem and makes it even more difficult to uncover real biological signals in the data. In this work, we focused on reconstructing high read-depth signals from low-depth WGS data. We make use of a pair of WGS data with different read-depth for the same sample and learn a mapping from low-depth signals to high-depth in the given platform. Results: We explored three different reconstruction models from shallow to deep. Our experimental results show that by only using the read depth information, deeper models do not perform far better than a linear regression model. Through incorporating additional information, such as GC-content, mappability and nucleotide sequence information, the performance of convolutional neural network (CNN) models can be further improved. We made use of the reconstructed read-depth signals in downstream analysis to identify copy number variation segments for single sample. The experiment results show that segments that are not detected using low-depth data, can be detected with the reconstructed signals by the CNN model using extra biological information.
机译:动机:使用DNA,RNA或甲基化测序的下一代测序(NGS)技术是现代基因组研究中使用的主要工具。对于DNA测序,全基因组测序(WGS)和全外显子组测序(WES)是两种典型的应用,它们在测序深度和碱基覆盖率之间的权衡取舍上有不同的偏爱。尽管测序成本已大大降低,但WGS中使用的序列深度相对低于WES(例如〜35×vs. 100×〜)。此外,NGS实验的不同阶段可能存在偏差和批量效应。使用低深度和偏倚的WGS数据进行下游分析对偏倚问题更为敏感,这使得揭露数据中的真实生物信号更加困难。在这项工作中,我们专注于从低深度WGS数据重构高读取深度信号。对于同一样本,我们利用一对具有不同读取深度的WGS数据,并了解在给定平台中从低深度信号到高深度的映射。结果:我们探索了从浅到深的三种不同的重建模型。我们的实验结果表明,仅使用读取的深度信息,较深的模型的性能不会比线性回归模型好得多。通过合并其他信息,例如GC含量,可映射性和核苷酸序列信息,可以进一步提高卷积神经网络(CNN)模型的性能。我们在下游分析中利用重构的读取深度信号来识别单个样品的拷贝数变异片段。实验结果表明,使用低深度数据无法检测到的片段可以通过CNN模型使用额外的生物学信息通过重构信号进行检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号