首页> 外文会议>2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops >In search of true reads: A classification approach to next generation sequencing data selection
【24h】

In search of true reads: A classification approach to next generation sequencing data selection

机译:寻找真读:下一代测序数据选择的分类方法

获取原文

摘要

Next generation sequencing (NGS) technology has increasingly become the backbone of transcriptomics analysis, but sequencer error causes biases in the read counts. In this paper we establish a framework for predicting true sequences from NGS data. We formulate this task as a classification problem. We define several features, such as log likelihood ratio of estimated true counts, error probability and observed count of the reads. Using a Support Vector Machine (SVM) classifier, we show that on simulated reads these features can achieve 96.35% classification accuracy in discriminating true sequences. Using this framework we provide a way for users to select sequences with a desired precision and recall for their analysis. The feature generation software and the simulated data set can be obtained from (http://seq.cbrc.jp/NGSFeatGen).
机译:下一代测序(NGS)技术已越来越成为转录组学分析的骨干,但测序仪错误会导致读取计数出现偏差。在本文中,我们建立了一个从NGS数据预测真实序列的框架。我们将此任务表述为分类问题。我们定义了几个功能,例如估计的真实计数的对数似然比,错误概率和读取的观察计数。使用支持向量机(SVM)分类器,我们证明了在模拟读取中,这些功能在区分真实序列时可以达到96.35%的分类精度。使用此框架,我们为用户提供了一种以所需的精度选择序列并调用以进行分析的方法。可以从(http://seq.cbrc.jp/NGSFeatGen)获得特征生成软件和模拟数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号