首页> 外文期刊>Journal of chemical information and modeling >Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors
【24h】

Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors

机译:机器学习选择的真实分子描述符代表的药理数据集上组合聚类方法的比较

获取原文
获取原文并翻译 | 示例
           

摘要

Cluster algorithms play an important role in diversity related tasks of modern chemoinformatics, with the widest applications being in pharmaceutical industry drug discovery programs. The performance of these grouping strategies depends on various factors such as molecular representation, mathematical method, algorithmical technique, and statistical distribution of data. For this reason, introduction and comparison of new methods are necessary in order to find the model that best fits the problem at hand. Earlier comparative studies report on Ward's algorithm using fingerprints for molecular description as generally superior in this field. However, problems still remain, i.e., other types of numerical descriptions have been little exploited, current descriptors selection strategy is trial and error-driven, and no previous comparative studies considering a broader domain of the combinatorial methods in grouping chemoinformatic data sets have been conducted. In this work, a comparison between combinatorial methods is performed,with five of them being novel in cheminformatics. The experiments are carried out using eight data sets that are well established and validated in the medical chemistry literature. Each drug data set was represented by real molecular descriptors selected by machine learning techniques, which are consistent with the neighborhood principle. Statistical analysis of the results demonstrates that pharmacological activities of the eight data sets can be modeled with a few of families with 2D and 3D molecular descriptors, avoiding classification problems associated with the presence of nonrelevant features. Three out of five of the proposed cluster algorithms show superior performance over most classical algorithms and are similar (or slightly superior in the most optimistic sense) to Ward's algorithm. The usefulness of these algorithms is also assessed in a comparative experiment to potent QSAR and machine learning classifiers, where they perform similarly in some cases.
机译:簇算法在现代化学信息学的多样性相关任务中起着重要作用,其中最广泛的应用是制药行业的药物发现程序。这些分组策略的性能取决于各种因素,例如分子表示,数学方法,算法技术和数据的统计分布。因此,有必要引入和比较新方法,以找到最适合当前问题的模型。较早的比较研究报告了使用指纹进行分子描述的Ward算法,在该领域通常比较优越。但是,仍然存在问题,即很少使用其他类型的数字描述,当前的描述符选择策略是反复试验和错误驱动的,并且以前尚未进行过考虑在化学信息学数据集分组中考虑更广泛组合方法的比较研究。在这项工作中,进行了组合方法之间的比较,其中有五种在化学信息学方面是新颖的。实验是使用八个数据集进行的,这些数据集在医学化学文献中得到了很好的建立和验证。每个药物数据集均由通过机器学习技术选择的真实分子描述符表示,这与邻域原理一致。结果的统计分析表明,可以使用几个带有2D和3D分子描述符的家族来模拟八个数据集的药理活性,从而避免了与不相关特征的存在相关的分类问题。提出的群集算法中有五分之三的性能优于大多数经典算法,并且与Ward的算法相似(或在最乐观的意义上略胜一筹)。这些算法的有效性还通过对有效QSAR和机器学习分类器的对比实验进行了评估,在某些情况下它们的性能相似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号