...
首页> 外文期刊>BMC Genomics >Context-based preprocessing of molecular docking data
【24h】

Context-based preprocessing of molecular docking data

机译:基于上下文的分子对接数据预处理

获取原文
   

获取外文期刊封面封底 >>

       

摘要

BackgroundData preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands.ResultsWe generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures.ConclusionsStatistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models.
机译:BackgroundData预处理是数据挖掘中的重要步骤。在数据预处理中,可以应用几种已知技术或开发新技术来提高数据质量,从而使挖掘结果变得更加准确和可理解。生物信息学是一个需要从大型数据集生成全面模型的领域。在本文中,我们提出了一种基于上下文的数据预处理方法来从分子对接模拟结果中挖掘数据。测试用例使用了结核分枝杆菌InhA酶(FFR_InhA)和四个不同配体的全柔性受体(FFR)模型。结果我们生成了一组初始属性以及它们各自的实例。为了改善此初始设置,我们应用了两种选择策略。第一种基于我们的基于上下文的方法,而第二种基于CFS(基于相关特征的选择)机器学习算法。此外,我们制作了一个额外的数据集,其中包含通过组合上下文策略和CFS算法选择的特征。为了证明所提出方法的有效性,我们基于各种预测(RMSE,MAE,Correlation和Nodes)和上下文(Precision,Recall和FScore)措施评估了其性能。结论对结果的统计分析表明,所提出的方法是基于上下文的数据预处理方法显着改善了预测和上下文度量,并且性能优于CFS算法。基于上下文的数据预处理通过生成出色的可解释模型来改善挖掘结果,这使其非常适合使用FFR模型进行分子对接模拟的实际应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号