首页> 外文期刊>Applied Geochemistry: Journal of the International Association of Geochemistry and Cosmochemistry >Statistical characterization of a large geochemical database and effect of sample size
【24h】

Statistical characterization of a large geochemical database and effect of sample size

机译:大型地球化学数据库的统计特征和样本量的影响

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively smaller numbers of data points showed that few elements passed standard statistical tests for normality or log-normality until sample size decreased to a few hundred data points. Large sample size enhances the power of statistical tests, and leads to rejection of most statistical hypotheses for real data sets. For large sample sizes (e.g., n > 1000), graphical methods such as histogram, stem-and-leaf, and probability plots are recommended for rough judgement of probability distribution if needed.
机译:作者调查了美国地质调查局国家地球化学调查(NGS)数据库中化学元素浓度的统计分布。在进行本研究时,NGS数据集包含来自美国本土的48,544条河流沉积物和土壤样品,经过4酸近乎完全消化后,通过ICP-AES分析。此报告包含27种元素:Al,Ca,Fe,K,Mg,Na,P,Ti,Ba,Ce,Co,Cr,Cu,Ga,La,Li,Mn,Nb,Nd,Ni,Pb,Sc, Sr,Th,V,Y和Zn。统计概述的目标和挑战是在复杂的,异构的数据集中描绘化学分布,这些数据集跨越较大的地理范围(美国本土)以及许多不同的地质省份和岩石类型。在进行聚类以创建具有16,511个样本的均匀空间样本分布后,使用直方图和分位数(Q-Q)图来描绘具有相干化学和矿物亲和力的子种群。通过图上斜率(扭结)的变化来识别概率分组。主要的岩石形成元素,例如Al,Ca,K和Na,倾向于在正常Q-Q图上显示线性段。这些部分通常可以与岩石学或矿物学联系起来。例如,K和Na图上的线性线段反映了石英砂对粘土矿物的稀释作用(K和Na中的含量低)。次要和痕量元素的关系最好显示在对数正态Q-Q图上。这些敏感地反映了在广泛数据范围内子种群中的离散关系。例如,Pb,Cu,Zn和Ag的较小但明显为对数线性的亚群被解释为代表天然矿物(如硫化物)的矿石品位富集。 27种化学元素中的任何一种均不能通过分簇数据集上正态分布或对数正态分布的测试。部分原因与亚群和异常值的混合物的存在有关。具有较少数据点数量的数据集的随机样本显示,很少有元素通过标准统计检验的正态性或对数正态性,直到样本量减小到数百个数据点为止。大样本量增强了统计检验的能力,并导致拒绝了针对真实数据集的大多数统计假设。对于大样本(例如n> 1000),建议使用图形化方法(例如直方图,茎叶和概率图)来粗略判断概率分布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号