...
首页> 外文期刊>Procedia Computer Science >Comparison of Feature Selection Methods to Classify Inhibitors in DUD-E Database
【24h】

Comparison of Feature Selection Methods to Classify Inhibitors in DUD-E Database

机译:DUD-E数据库中抑制剂分类的特征选择方法比较

获取原文
   

获取外文期刊封面封底 >>

       

摘要

In designing a new drug, inhibitor compound is usually used to control the enzyme work to recover a particular disease. In the drug design technique, the classification of inhibitor is carry out by docking software to simulate the bounding of mixing (new inhibitor candidate) with the targeted enzyme. DUD-E is a database to simulate docking with high dimensional data characteristic, which lead to the feasibility of machine learning approach as the analytical tool. A compound with specific characterictics can be classified into ligand or decoy by using many characterictics leading to a problem in the machine learning algorithm. This paper discusses feature selection analysis to obtain the compound characteristics which are effectively determine ligand or decoy. This paper examined Mutual Information-based Feature Selection (MIFS), Correlation-based Feature Selection (CFS) as well as Fast Correlation-Based Filter (FCBF), and the results show that the FCBF always selects less number of features with fastest runtime of classification. The highest classification accuracy is obtained when all features are used in the classification by k-NN. However, the accuracy is slightly different with classification using selected features. The CFS method performs well for Data-A with accuracy of 89,55%, while the MIFS outperforms the others for Data-B and Data-C with the classification accuracy of 92,34% and 95,20% consecutively.
机译:在设计新药时,抑制剂化合物通常用于控制酶的作用以恢复特定疾病。在药物设计技术中,抑制剂的分类是通过对接软件进行的,以模拟与目标酶混合(新抑制剂候选物)的边界。 DUD-E是用于模拟具有高维数据特征的对接的数据库,这导致了将机器学习方法用作分析工具的可行性。通过使用许多导致机器学习算法中存在问题的特性,可以将具有特定特性的化合物分类为配体或诱饵。本文讨论了特征选择分析以获得能够有效确定配体或诱饵的化合物特征。本文研究了基于互信息的特征选择(MIFS),基于相关性的特征选择(CFS)以及基于快速相关性的过滤器(FCBF),结果表明,FCBF总是选择数量较少且运行时间最快的特征分类。当所有特征都通过k-NN用于分类时,可以获得最高的分类精度。但是,使用所选要素进行分类的准确性略有不同。 CFS方法对Data-A的性能很好,准确度为89.55%,而MIFS在Data-B和Data-C方面的性能优于其他方法,其分类准确率分别为92.34%和95,20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号