Supervised Machine Learning and Heuristic Algorithms for Outlier Detection in Irregular Spatiotemporal Datasets

Chowdhury K. P.

首页> 外文期刊>Journal of environment informatics >Supervised Machine Learning and Heuristic Algorithms for Outlier Detection in Irregular Spatiotemporal Datasets

【24h】

Supervised Machine Learning and Heuristic Algorithms for Outlier Detection in Irregular Spatiotemporal Datasets

机译：不规则时空数据集中的监督机器学习和启发式算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A central problem in time series analysis is the detection of outliers, with further complications presented by irregular time series data measured having spatiotemporal components. This paper presents one Heuristic and two Supervised Machine Learning algorithms for the detection of outliers in this context in univariate time series data, with comparison of results to Chen and Liu's (1993) automatic outlier detection methodology. Due to the recent trend of set up of large environmental databases across many states in the US and around the world, which allow submission of pollutant measurement data from virtually any source, these procedures are applied to the measurements of various surface water pollutants in the California Environmental Data Exchange Network (CEDEN) for understanding and exploring the viability of such databases and the proposed methods. The proposed methodologies though not as robust, give similar results to existing methodologies given the nature of the data, but can be far less time intensive to implement providing interesting insights into the database. Thus, the algorithms presented can be widely used with minimal computing resource requirements with very tractable results even with very large datasets. The methodologies have wide applicability in a variety of contexts and a wide variety of databases with similar measurement challenges across many disciplines, specifically in the environmental setting. In particular, the results have large potential regulatory impact on accepted levels of different pollutants in California water bodies, as well as the amounts to be charged for industrial discharge into those water bodies, and is intended to provide direction for further research and regulatory investments. Based on the results it seems reasonable to assume that there is further room for the inclusion of nongovernmental agency pollutant measurements in the debate of environmental pollution, specifically in California. However, the results also indicate that the use of such databases in a more inclusive way for regulatory matters must be carefully evaluated on an individualized basis. That is to ensure that poorly collected/handled measurements, do not inundate the database over and above those collected with more rigor, thus potentially making inference on the true population distribution of the pollutants more difficu being especially relevant for those pollutant measurements, which require more delicate sampling procedures.

机译：时间序列分析中的一个中心问题是离群值的检测，而具有时空分量的不规则时间序列数据则进一步带来了复杂性。本文提出了一种启发式算法和两种监督机器学习算法，用于在单变量时间序列数据中检测异常情况，并将结果与Chen和Liu（1993）的自动异常值检测方法进行了比较。由于最近在美国和世界各地的许多州建立大型环境数据库的趋势，使得可以从几乎任何来源提交污染物测量数据，因此这些程序适用于加利福尼亚州各种地表水污染物的测量环境数据交换网络（CEDEN），用于了解和探索此类数据库和拟议方法的可行性。所提出的方法虽然不那么健壮，但鉴于数据的性质，其结果与现有方法相似，但实现对数据库的有趣见解所需的时间却少得多。因此，即使对于非常大的数据集，所提出的算法也可以以最少的计算资源需求被广泛使用，并具有非常易于处理的结果。这些方法在各种情况下具有广泛的适用性，并且在许多学科中，特别是在环境环境中，具有类似测量挑战的各种数据库。特别是，该结果对加州水体中各种污染物的可接受水平以及向这些水体中工业排放的收费量具有巨大的潜在监管影响，旨在为进一步的研究和监管投资提供指导。根据结果，似乎可以合理地假设，在有关环境污染的辩论中，特别是在加利福尼亚州，将非政府机构的污染物测量包括在内还存在进一步的空间。但是，结果还表明，必须在个性化的基础上仔细评估以更广泛的方式将此类数据库用于管理事务。这是为了确保收集/处理不当的测量结果不会使数据库更加严格地收集数据，从而可能难以推断出污染物的真实种群分布；对于那些需要更精细采样程序的污染物测量尤其重要。

著录项

来源
《Journal of environment informatics》 |2019年第1期|1-16|共16页
作者
Chowdhury K. P.;
展开▼
作者单位

Univ Calif Irvine, Paul Merage Sch Business, Irvine, CA 92697 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
time series; irregular spatiotemporal time series; outlier detection; water pollution; CEDEN;

机译：时间序列不规则时空时间序列异常检测水污染CEDEN;

相似文献

外文文献
中文文献
专利

1. Insider Threat Detection Using Supervised Machine Learning Algorithms on an Extremely Imbalanced Dataset [J] . International Journal of Cyber Warfare and Terrorism . 2020,第2期

机译：使用监督的机器学习算法对极端不平衡的数据集进行内部威胁检测
2. Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets [J] . McAllister Patrick, Zheng Huiru, Bond Raymond, Computers in Biology and Medicine . 2018,第期

机译：与监督机器学习算法相结合的深度残余神经网络功能对不同的食物图像数据集进行分类
3. Supervised meta‑heuristic extreme learning machine for multiple sclerosis detection based on multiple feature descriptors in MR images [J] . Adele Rezaee, Khosro Rezaee, Javad Haddadnia, SN Applied Sciences . 2020,第5期

机译：基于MR图像中多个特征描述符的多变硬化检测监督超元启发式学习机
4. Performance Analysis of Supervised Machine Learning Algorithms for Epileptic Seizure Detection with high variability EEG datasets: A Comparative Study [C] . Gopal Chandra Jana, Anshuman Sabath, Anupam Agrawal International Conference on Electrical, Electronics and Computer Engineering . 2019

机译：高变异性EEG数据集的监督性机器学习算法在癫痫发作检测中的性能分析：比较研究
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. New Approach for Risk Estimation Algorithms of BRCA1/2 Negativeness Detection with Modelling Supervised Machine Learning Techniques [O] . Hulya Yazici, Demet Akdeniz Odemis, Dogukan Aksu, 2020

机译：BRCA1 / 2消极检测风险估计算法的新方法采用造型监督机学习技术
7. Supervised Machine Learning and Heuristic Algorithms for Outlier Detection in Irregular Spatiotemporal Datasets [O] . K. P. Chowdhury 2018

机译：监督机器学习和不规则时空数据集中异常检测的启发式算法

Supervised Machine Learning and Heuristic Algorithms for Outlier Detection in Irregular Spatiotemporal Datasets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅