The size of large, geo-located datasets has reached scales wherevisualization of all data points is inefficient. Random sampling is a method toreduce the size of a dataset, yet it can introduce unwanted errors. We describea method for subsampling of spatial data suitable for creating kernel densityestimates from very large data and demonstrate that it results in less errorthan random sampling. We also introduce a method to ensure that thresholding oflow values based on sampled data does not omit any regions above the desiredthreshold when working with sampled data. We demonstrate the effectiveness ofour approach using both, artificial and real-world large geospatial datasets.
展开▼