首页> 外文期刊>Journal of Hydrology >Using t-distributed Stochastic Neighbor Embedding (t-SNE) for cluster analysis and spatial zone delineation of groundwater geochemistry data
【24h】

Using t-distributed Stochastic Neighbor Embedding (t-SNE) for cluster analysis and spatial zone delineation of groundwater geochemistry data

机译:采用T分布式随机邻居嵌入(T-SNE)进行集群分析和地下水地球化学数据的空间区域描绘

获取原文
获取原文并翻译 | 示例
           

摘要

Cluster analysis is a valuable tool for understanding spatial and temporal patterns (e.g., spatial zones) of groundwater geochemistry. To determine cluster numbers and cluster memberships that are unknown in real-world problems, a number of methods have been used to assist cluster analysis, among which graphic approaches are popular and intuitive. This study introduced, for the first time, the t-distributed Stochastic Neighbor Embedding (t-SNE) method as a graphic approach to assist cluster analysis for groundwater geochemistry data. The hierarchical cluster analysis (HCA) was applied to original groundwater geochemistry data, and t-SNE was used to help determine the number of cluster and cluster memberships. Afterward, t-SNE was used to help delineate spatial zones of groundwater geochemistry. The t-SNE-based cluster visualization was compared to the visualization based on principal component analysis (PCA). By applying HCA, PCA, and t-SNE to three geochemical datasets (Oslo transect, Taiyuan karst water, and Jianghan Plain groundwater datasets, which are characterized by different number of samples and features collected across different space and time scales), we found that t-SNE outperformed PCA to assist HCA as a promising tool for helping determine the number of HCA clusters and delineate spatial zones of groundwater geochemistry. It should be noted that t-SNE alone cannot be used for cluster analyses, partly because t-SNE visualization depends on a hyperparameter called perplexity that is a priori unknown for real-world problems. The perplexity values used in this study were determined empirically, and a small value of 0.1 was used for the Taiyuan karst water dataset with 14 samples. For the other two datasets with hundreds of samples, the corresponding perplexity values were 20 and 30, within the range of 5 50 commonly used in t-SNE.
机译:聚类分析是了解地下水地球化学时空模式(如空间带)的一个有价值的工具。为了确定实际问题中未知的聚类数和聚类成员,人们使用了许多方法来辅助聚类分析,其中图形方法是流行且直观的。本研究首次将t-分布随机邻域嵌入(t-SNE)方法作为一种图形方法引入地下水地球化学数据的聚类分析。将层次聚类分析(HCA)应用于原始地下水地球化学数据,并使用t-SNE帮助确定聚类数和聚类成员数。之后,t-SNE被用来帮助划定地下水地球化学的空间区域。将基于t-SNE的聚类可视化与基于主成分分析(PCA)的可视化进行了比较。通过将HCA、PCA和t-SNE应用于三个地球化学数据集(奥斯陆样带、太原岩溶水和江汉平原地下水数据集,这些数据集的特点是在不同的空间和时间尺度上采集了不同数量的样本和特征),我们发现,t-SNE优于PCA,有助于确定HCA簇的数量和划分地下水地球化学的空间区域。应该注意的是,t-SNE不能单独用于聚类分析,部分原因是t-SNE可视化依赖于一个称为“困惑度”的超参数,该参数对于现实世界的问题来说是先验未知的。本研究中使用的困惑值是经验性确定的,太原岩溶水数据集有14个样本,使用较小的值0.1。对于其他两个有数百个样本的数据集,相应的困惑值分别为20和30,在t-SNE中常用的5-50范围内。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号