首页> 外文期刊>Journal of Medical Imaging and Health Informatics >Comparative Analysis of Machine Learning Algorithms for Mycobacterium Tuberculosis Protein Sequences on the Basis of Physicochemical Parameters
【24h】

Comparative Analysis of Machine Learning Algorithms for Mycobacterium Tuberculosis Protein Sequences on the Basis of Physicochemical Parameters

机译:基于理化参数的结核分枝杆菌蛋白质序列机器学习算法的比较分析

获取原文
获取原文并翻译 | 示例
           

摘要

The genus Mycobacterium is best known for its two major pathogenic species, M. tuberculosis and M. leprae, the causative agents of two of the world's oldest diseases, tuberculosis and leprosy, respectively. M. tuberculosis kills approximately two million people each year and is thought to latently infect one-third of the world's population. Proteins of these two strains vary a lot and do share some similar characteristics too. Wet lab experiments usually used to classify the proteins of these strains are highly expensive, labor intensive and time consuming. Thus there arises a need for computational approach for classification of H37Rv and Leprae. These computational approaches are fast and economical as compared to wet lab techniques. Realizing their importance, in this paper an attempt has been made to correlate strains with their physicochemical parameters and predict them with fair accuracy. The new SVM learning algorithm called Sequential Minimal Optimization (or SMO) and LibSVM and Multilayer perceptron has been used for classification of H37Rv and Leprae. The performance of the method was evaluated using 5-fold and 10-fold cross-validation. The model has been tested on available data using different SVM learning algorithms and a comparative analysis was made which gives results with 77%, 75% and 85% accuracy respectively. Hence the best algorithm for classification is Multilayer Perceptron using 5-fold cross-validation.
机译:分枝杆菌属以其两个主要致病菌种结核分枝杆菌和麻风分枝杆菌而闻名,这两种病原分别是世界上最古老的两种疾病,结核病和麻风病。结核分枝杆菌每年杀死约200万人,被认为潜在地感染了世界三分之一的人口。这两个菌株的蛋白质相差很大,并且确实具有一些相似的特征。通常用于对这些菌株的蛋白质进行分类的湿实验室实验非常昂贵,劳动强度大且耗时。因此,需要用于将H37Rv和Leprae分类的计算方法。与湿实验室技术相比,这些计算方法既快速又经济。认识到它们的重要性,本文尝试将菌株与其理化参数相关联,并以合理的准确性对其进行预测。新的SVM学习算法称为顺序最小优化(SMO)和LibSVM和多层感知器已用于H37Rv和Leprae的分类。使用5倍和10倍交叉验证评估了该方法的性能。该模型已使用不同的SVM学习算法在可用数据上进行了测试,并进行了比较分析,得出的结果分别具有77%,75%和85%的准确性。因此,用于分类的最佳算法是使用5倍交叉验证的多层感知器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号