首页> 外文期刊>Journal of Mathematical Chemistry >Outliers detection in the statistical accuracy test of a pKa prediction
【24h】

Outliers detection in the statistical accuracy test of a pKa prediction

机译:pK a 预测的统计准确性检验中的异常值检测

获取原文
获取原文并翻译 | 示例
       

摘要

The regression diagnostics algorithm REGDIA in S-Plus is introduced to examine the accuracy of pK a predicted with four programs: PALLAS, MARVIN, PERRIN and SYBYL. On basis of a statistical analysis of residuals, outlier diagnostics are proposed. Residual analysis of the ADSTAT program is based on examining goodness-of-fit via graphical diagnostics of 15 exploratory data analysis plots, such as bar plots, box-and-whisker plots, dot plots, midsum plots, symmetry plots, kurtosis plots, differential quantile plots, quantile-box plots, frequency polygons, histograms, quantile plots, quantile-quantile plots, rankit plots, scatter plots, and autocorrelation plots. Outliers in pK a relate to molecules which are poorly characterized by the considered pK a program. Of the seven most efficient diagnostic plots (the Williams graph, Graph of predicted residuals, Pregibon graph, Gray L–R graph, Index graph of Atkinson measure, Index graph of diagonal elements of the hat matrix and Rankit Q–Q graph of jackknife residuals) the Williams graph was selected to give the most reliable detection of outliers. The six statistical characteristics, , and s in pK a units, successfully examine the specimen of 25 acids and bases of a Perrin’s data set classifying four pK a prediction algorithms. The highest values and the lowest value of MEP and s and the most negative AIC have been found for PERRIN algorithm of pK a prediction so this algorithm achieves the best predictive power and the most accurate results. The proposed accuracy test of the REGDIA program can also be extended to test other predicted values, as log P, log D, aqueous solubility or some physicochemical properties. Keywords pK a prediction - Dissociation constants - Outliers - Residuals - Goodness-of-fit - Williams graph
机译:引入了S-Plus中的回归诊断算法REGDIA,以检验使用PALLAS,MARVIN,PERRIN和SYBYL四个程序预测的pK a 的准确性。在对残差进行统计分析的基础上,提出了异常诊断方法。 ADSTAT程序的残差分析基于通过对15个探索性数据分析图进行图形诊断来检查拟合优度,例如条形图,箱须图,点图,中和图,对称图,峰度图,微分图分位数图,分位数盒图,频率多边形,直方图,分位数图,分位数-分位数图,兰吉特图,散点图和自相关图。 pK a 中的异常值与被认为是pK a 程序的分子特征不佳有关。在七个最有效的诊断图中(威廉姆斯图,预测残差图,Pregibon图,灰色L–R图,阿特金森量度的索引图,帽子矩阵的对角元素的索引图和绞刀残差的Rankit Q–Q图) )选择Williams图可以最可靠地检测异常值。六个统计特征,和,以pK a 单位表示,成功检查了Perrin数据集中25种酸和碱的样本,该数据集对四种pK a 预测算法进行了分类。对于pK a 预测的PERRIN算法,已找到MEP和s的最大值和最小值,而AIC值则为负值,因此该算法可实现最佳的预测能力和最准确的结果。 REGDIA程序的建议精度测试也可以扩展为测试其他预测值,例如log P,log D,水溶性或某些理化性质。 pK a 预测-离解常数-离群值-残差-拟合优度-Williams图

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号