首页> 外文会议>2011 IEEE International Conference on Computer Science and Automation Engineering >Interpretable knowledge acquisition for predicting DNA-binding domains using an evolutionary fuzzy classifier method
【24h】

Interpretable knowledge acquisition for predicting DNA-binding domains using an evolutionary fuzzy classifier method

机译:使用进化模糊分类器方法预测DNA结合域的可解释性知识获取

获取原文

摘要

DNA-binding domains are functional proteins in a cell, which plays a vital role in various essential biological activities. It is desirable to predict and analyze novel proteins from protein sequences only using machine learning approaches. Numerous prediction methods were proposed by identifying informative features and designing effective classifiers. The support vector machine (SVM) is well recognized as an accurate and robust classifier. However, the block-box mechanism of SVM suffers from low interpretability for biologists. It is better to design a prediction method using interpretable features and prediction results. In this study, we propose an interpretable physicochemical property classifier (named iPPC) with an accurate and compact fuzzy rule base using a scatter partition of feature space for DNA-binding data analysis. In designing iPPC, the flexible membership function, fuzzy rule, and physicochemical properties selection are simultaneously optimized. An intelligent genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters to maximize prediction accuracy, minimize the number of features selected, and minimize the number of fuzzy rules. Using benchmark datasets of DNA-binding domains, iPPC obtains the training accuracy of 81% and test accuracy of 79% with three fuzzy rules and two physicochemical properties. Compared with the decision tree method with a training accuracy of 77%, iPPC has a more compact and interpretable knowledge base. The two physicochemical properties are Number of hydrogen bond donors and Helix-coil equilibrium constant in the AAindex database.
机译:DNA结合结构域是细胞中的功能蛋白,在各种重要的生物学活动中起着至关重要的作用。期望仅使用机器学习方法从蛋白质序列预测和分析新蛋白质。通过识别信息特征和设计有效的分类器,提出了许多预测方法。支持向量机(SVM)被公认为一种准确而强大的分类器。但是,支持向量机的区块机制对生物学家来说解释性很低。最好使用可解释的特征和预测结果来设计预测方法。在这项研究中,我们提出了一个可解释的理化性质分类器(名为iPPC),该分类器使用特征空间的散布分区进行DNA结合数据分析,具有精确而紧凑的模糊规则库。在设计iPPC时,同时优化了柔性隶属函数,模糊规则和理化性质选择。智能遗传算法IGA用于通过大量调整参数有效解决设计问题,从而最大程度地提高预测精度,减少所选特征的数量并减少模糊规则的数量。使用DNA结合域的基准数据集,iPPC具有三个模糊规则和两个物理化学性质,可获得81%的训练准确度和79%的测试准确度。与训练精度为77%的决策树方法相比,iPPC具有更紧凑和可解释的知识库。 AAindex数据库中的两个物理化学性质是氢键供体数和螺旋螺旋平衡常数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号