首页> 美国卫生研究院文献>PLoS Clinical Trials >Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms
【2h】

Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms

机译:评估模拟模型,以模拟纳米孔定序器和分割算法引入到花形中的失真

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These ‘squiggles’ are a noisy, distorted representation of the underlying true stepped current levels due to experimental and algorithmic factors. We were interested in developing a simulation model to support a white-box approach to identify common distortions, rather than relying on commonly used black box neural network techniques for basecalling nanopore signals. Dynamic time warped-space averaging (DTWA) techniques can generate a consensus from multiple noisy signals without introducing key feature distortions that occur with standard averaging. As a preprocessing tool, DTWA could provide cleaner and more accurate current signals for direct RNA or DNA analysis tools. However, DTWA approaches need modification to take advantage of the a-priori knowledge regarding a common, underlying gold-standard RNA / DNA sequence. Using experimental data, we derive a simulation model to provide known squiggle distortion signals to assist in validating the performance of analysis tools such as DTWA. Simulation models were evaluated by comparing mocked and experimental squiggle characteristics from one Enolase mRNA squiggle group produced by an Oxford MinION nanopore sequencer, and cross-validated using other Enolase, Sequin R1_71_1 and Sequin R2_55_3 mRNA studies. New techniques identified high inserted but low deleted base rates, generating consistent x1.7 squiggle event to base called ratios. Similar probability density and cumulative distribution functions, PDF and CDF, were found across all studies. Experimental PDFs were not the normal distributions expected if squiggle distortion arose from segmentation algorithm artefacts, or through individual nucleotides randomly interacting with individual nanopores. Matching experimental and mocked CDFs required the assumption that there are unique features associated with individual raw-current data streams. Z-normalized signal-to-noise ratios suggest intrinsic sensor limitations being responsible for half the gold standard and noisy squiggle DTW differences.
机译:通过纳米孔定序器的生物分子孔棘手的核苷酸产生原始的皮安电流,将其分成代表核苷酸序列的步进电流水平信号。由于实验和算法因素的影响,这些“波形”表示潜在的真实步进电流电平是嘈杂的,失真的表示。我们感兴趣的是开发一种仿真模型来支持白盒方法来识别常见的失真,而不是依赖于常用的黑盒神经网络技术来碱基调用纳米孔信号。动态时间扭曲空间平均(DTWA)技术可以从多个噪声信号生成共识,而不会引入标准平均时出现的关键特征失真。作为一种预处理工具,DTWA可以为直接RNA或DNA分析工具提供更清晰,更准确的电流信号。但是,DTWA方法需要修改,以利用有关常见的基本金标准RNA / DNA序列的先验知识。使用实验数据,我们得出一个仿真模型,以提供已知的弯曲失真信号,以帮助验证DTWA等分析工具的性能。通过比较牛津MinION纳米孔测序仪产生的一个Enolase mRNA弯曲组的模拟和实验弯曲特征,并使用其他Enolase,Sequin R1_71_1和Sequin R2_55_3 mRNA研究进行交叉验证,对仿真模型进行了评估。新技术确定了较高的插入率但较低的删除基准率,从而生成了一致的x1.7 squiggle事件与基准的比率。在所有研究中都发现了相似的概率密度和累积分布函数,PDF和CDF。如果由于分割算法伪影或通过与单个纳米孔随机相互作用的单个核苷酸引起的扭曲变形,则实验性 PDFs 不是预期的正态分布。要匹配实验性和模拟性的 CDF ,需要假设与各个原始电流数据流具有独特的功能。 Z-归一化的信噪比表明,固有的传感器限制是金标准的一半,并且是嘈杂的,弯曲的 DTW 差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号