In search of true reads: A classification approach to next generation sequencing data selection

机译：寻找真读：下一代测序数据选择的分类方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Next generation sequencing (NGS) technology has increasingly become the backbone of transcriptomics analysis, but sequencer error causes biases in the read counts. In this paper we establish a framework for predicting true sequences from NGS data. We formulate this task as a classification problem. We define several features, such as log likelihood ratio of estimated true counts, error probability and observed count of the reads. Using a Support Vector Machine (SVM) classifier, we show that on simulated reads these features can achieve 96.35% classification accuracy in discriminating true sequences. Using this framework we provide a way for users to select sequences with a desired precision and recall for their analysis. The feature generation software and the simulated data set can be obtained from (http://seq.cbrc.jp/NGSFeatGen).

机译：下一代测序（NGS）技术已越来越成为转录组学分析的骨干，但测序仪错误会导致读取计数出现偏差。在本文中，我们建立了一个从NGS数据预测真实序列的框架。我们将此任务表述为分类问题。我们定义了几个功能，例如估计的真实计数的对数似然比，错误概率和读取的观察计数。使用支持向量机（SVM）分类器，我们证明了在模拟读取中，这些功能在区分真实序列时可以达到96.35％的分类精度。使用此框架，我们为用户提供了一种以所需的精度选择序列并调用以进行分析的方法。可以从（http://seq.cbrc.jp/NGSFeatGen）获得特征生成软件和模拟数据集。

著录项

来源
《2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops》|2010年|p.561-566|共6页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理技术;
关键词
Illumina; Solexa; classification; expectation maximization; next generation sequencing; transcriptomics;

机译：Illumina; Solexa;分类;期望最大化;下一代测序;转录组学;

相似文献

外文文献
中文文献
专利

1. A sensitive short read homology search tool for paired-end read sequencing data [J] . Prapaporn Techa-Angkoon, Yanni Sun, Jikai Lei BMC Bioinformatics . 2017,第12期

机译：敏感的短读同源性搜索工具，用于配对末端读测序数据
2. A classification approach for DNA methylation profiling with bisulfite next-generation sequencing data [J] . Cheng Longjie, Zhu Yu Bioinformatics . 2014,第2期

机译：亚硫酸氢盐下一代测序数据进行DNA甲基化分析的分类方法
3. grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories [J] . Taylor Louis J., Abbas Arwa, Bushman Frederic D. Bioinformatics . 2020,第11期

机译：Grabseqs：从多个下一代测序数据存储库简单地下载读取和元数据
4. In search of true reads: A classification approach to next generation sequencing data selection [C] . {missing} IEEE International Conference on Bioinformatics and Biomedicine Workshop . 2010

机译：寻找真实读取：下一代测序数据选择的分类方法
5. An Eulerian Path approach to next-generation DNA sequencing with pre-sorted reads. [D] . Barker, Darlene Fleming. 2013

机译：具有预排序读数的下一代DNA测序的欧拉路径方法。
6. A sensitive short read homology search tool for paired-end read sequencing data [O] . Prapaporn Techa-Angkoon, Yanni Sun, Jikai Lei 2017

机译：敏感的短读同源性搜索工具用于配对末端读测序数据
7. Supplementary Information : Hogan, Holland, Holloway, Petit and Read : Read Classification for Next Generation Sequencing, ESANN 2013, April 2013 [O] . Hogan James M. 100

机译：补充信息：Hogan，Holland，Holloway，petit和Read：阅读下一代测序分类，EsaNN 2013，2013年4月

In search of true reads: A classification approach to next generation sequencing data selection

摘要

著录项

相似文献

相关主题

期刊订阅