Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches

Kurt Serkan; Oz Ersoy; Askin Oykum Esra; Oz Yeliz Yucel

首页> 外文期刊>Neural computing & applications >Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches

【24h】

Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches

机译：利用物流回归和决策树方法对质量评估核苷酸序列的分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Knowledge of DNA sequences is indispensable for basic biological research. Many researchers use DNA sequencing for various purposes including molecular biology research and sequence comparison for individual identification. Automated DNA sequencing devices use four colored chromatograms or base-calling signals to indicate strength of hybridization for each base channel. Typically, relative strengths of peaks at each base location are used to quantify the quality and/or reliability of individual readings. However, assessment of overall quality of whole DNA trace files remains to be an open problem. Therefore, classification of raw DNA trace files as high or low quality is an important issue for efficient utilization of resources. In this study, we have used several supervised machine learning approaches, including logistic regression and ensemble decision trees, to identify high- or acceptable-quality chromatogram files and compared their prediction performances. In order to test and develop our ideas, we have used a public DNA trace repository consisting of 1626 high- and 631 low-quality files marked by our expert molecular biologist. Our results indicate that, although all of the methods tried offer comparable and acceptable performances, random forest decision tree algorithm with adapting boosting ensemble learning shows slightly higher prediction accuracy with as few as four features.

机译：对DNA序列的知识对于基础生物学研究是必不可少的。许多研究人员使用DNA测序进行各种目的，包括分子生物学研究和个体鉴定的序列比较。自动DNA测序装置使用四个彩色色谱图或基准呼叫信号来指示每个碱基通道的杂交强度。通常，每个基站位置处的峰的相对强度用于量化各个读数的质量和/或可靠性。但是，对整个DNA跟踪文件的整体质量评估仍然是一个公开问题。因此，原始DNA跟踪文件的分类为高或低质量是有效利用资源的重要问题。在这项研究中，我们使用了多种监督机器学习方法，包括逻辑回归和集合树决策树，以识别高或可接受的质量色谱图文件并比较其预测性能。为了测试和发展我们的想法，我们使用了由我们专家分子生物学家标志的1626个高和631个低质量文件组成的公共DNA跟踪存储库。我们的结果表明，尽管所有方法都尝试提供可比和可接受的表现，但随机林决策树算法随着调整升压集合学习显示略高的预测准确性，略高于四个特征。

著录项

来源
《Neural computing & applications》 |2018年第8期|共12页
作者
Kurt Serkan; Oz Ersoy; Askin Oykum Esra; Oz Yeliz Yucel;
展开▼
作者单位

Yildiz Tech Univ Fac Elect &

Elect Engn Dept Elect &

Commun Engn Istanbul Turkey;

Yildiz Tech Univ Fac Arts &

Sci Dept Stat Istanbul Turkey;

Yildiz Tech Univ Fac Arts &

Sci Dept Stat Istanbul Turkey;

Istanbul Tech Univ Mol Biol Biotechnol Istanbul Turkey;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工神经网络计算机;人工智能理论;
关键词
DNA sequencing; Decision tree; Ensemble learning algorithms; Logistic regression;

机译：DNA测序;决策树;合奏学习算法;Logistic回归;

相似文献

外文文献
中文文献
专利

1. Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches [J] . Kurt Serkan, Oz Ersoy, Askin Oykum Esra, Neural computing & applications . 2018,第8期

机译：利用物流回归和决策树方法对质量评估核苷酸序列的分类
2. Combining Logistic Regression with Classification and Regression Tree to Predict Quality of Care in a Home Health Nursing Data Set [J] . Huey-Ming Guo, Yea-Ing Lotus Shyu, Her-Kun Chang Studies in Health Technology and Informatics . 2006,第期

机译：将Logistic回归与分类和回归树相结合，以预测家庭保健护理数据集中的护理质量
3. Machine Learning Methods of Kernel Logistic Regression and Classification and Regression Trees for Landslide Susceptibility Assessment at Part of Himalayan Area, India [J] . Binh Thai Pham, Indra Prakash Indian Journal of Science and Technology . 2018,第12期

机译：印度喜马拉雅地区部分地区滑坡敏感性评估的核逻辑回归与分类回归树机器学习方法
4. Combining Logistic Regression with Classification and Regression Tree to Predict Quality of Care in a Home Health Nursing Data Set [C] . Huey-Ming Guo, Yea-Ing Lotus Shyu, Her-Kun Chan International Congress on Nursing Informatics. . 2006

机译：将Logistic回归与分类和回归树相结合，以预测家庭健康护理数据集的护理质量
5. Rapid Classification of NifH Protein Sequences using Classification and Regression Trees [D] . Frank, Ildiko 2014

机译：使用分类和回归树对NifH蛋白序列进行快速分类
6. Factors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis [O] . Azam Rastegari, Ali Akbar Haghdoost, Mohammad Reza Baneshi 2013

机译：影响囚犯注射毒品史的因素：分类树和回归树的比较以及Logistic回归分析
7. A decision support tool for predicting patients at risk of readmission : a comparison of classification trees, logistic regression, generalized additive models, and multivariate adaptive regression splines [O] . Demir, Eren 2014

机译：用于预测有再入院风险的患者的决策支持工具：分类树，逻辑回归，广义加性模型和多元自适应回归样条的比较

Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches

摘要

著录项

相似文献

相关主题

期刊订阅