Detecting and Explaining Exceptional Values in Categorical Data: DISCUSSION PAPER

机译：检测和解释分类数据中的异常值：讨论论文

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work we deal with the problem of detecting and explaining exceptional behaving values in categorical datasets by perceiving an attribute value as anomalous if its frequency occurrence is exceptionally typical or un-typical within the distribution of frequencies occurrences of any other attribute value. The notion of frequency occurrence is provided by specialising the Kernel Density Estimation method to the domain of frequency values and an outlierness measure is defined by leveraging the cdf of such a density. This measure is able to simultaneously identify two kinds of anomalies called lower outliers and upper outliers, namely exceptionally low or high frequent values. Moreover, data values labeled as outliers come with an interpretable explanations for their abnormality, which is a desirable feature of any knowledge discovery technique.

机译：在这项工作中，我们通过将属性值视为异常来处理分类数据集中异常行为值的检测和解释问题，如果属性值的出现频率在任何其他属性值的出现频率分布中异常典型或不典型。频率发生的概念是通过将核密度估计方法专门化到频率值域来提供的，离群度度量是通过利用这种密度的cdf来定义的。这种测量方法能够同时识别两种异常，称为低异常值和高异常值，即异常低或异常高频率值。此外，标记为异常值的数据值为其异常提供了可解释的解释，这是任何知识发现技术的理想特征。

著录项

来源
《Italian Symposium on Advanced Database Systems》|2020年|356p|共8页
会议地点
作者
Fabrizio Angiulli; Fabio Fassetti; Luigi Palopoli; Cristina Serrao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP392-53;
关键词

相似文献

外文文献
中文文献
专利

1. The application of big data and the development of nursing science: A discussion paper [J] . Ruifang Zhu, Shifan Han, Yanbing Su, 国际护理科学（英文） . 2019,第002期
2. The application of big data and the development of nursing science: A discussion paper [J] . Ruifang Zhu, Shifan Han, Yanbing Su, 国际护理科学（英文） . 2019,第002期
3. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set [J] . Amir Ahmad, Lipika Dey Pattern recognition letters . 2007,第1期

机译：一种在分类数据集无监督学习中计算同一属性的两个分类值之间距离的方法
4. Discussion of "regularized regression for categorical data' [J] . Shepherd Bryan E., Liu Qi Statistical modeling: applications in contemporary issues . 2016,第3期

机译：关于“分类数据的规范化回归”的讨论
5. Discussion of "regularized regression for categorical data' by Tutz and Gertheiss [J] . Leng Chenlei Statistical modeling: applications in contemporary issues . 2016,第3期

机译：Tutz和Gertheiss对“分类数据的规范化回归”的讨论
6. A Density Estimation Approach for Detecting and Explaining Exceptional Values in Categorical Data [C] . Fabrizio Angiulli, Fabio Fassetti, Luigi Palopoli, International conference on discovery science . 2019

机译：用于检测和解释分类数据中异常值的密度估计方法
7. Categorical properties of lattice -valued convergence spaces [D] . Flores, Paul V. 2007

机译：格值收敛空间的分类性质
8. Discussion of ‘Regularized Regression for Categorical Data’ [O] . Bryan E. Shepherd, Qi Liu -1

机译：关于分类数据的正则回归的讨论
9. Erratum: Oncogenic KRAS Regulates Tumor Cell Signaling via Stromal Reciprocation:Our paper demonstrated the cell-autonomous and non-cell-autonomous effects of oncogene signaling in tumor and stromal cells using a proteomic approach. It has come to our attention that Data S1, which summarized our proteomic and phosphoproteomic data, included two sets of errors. In the tab related to Figure 3E, the data were labeled as representing log2-transformed ratios but were erroneously formatted to represent natural ratios. These numbers have now been changed to represent log2-transformed ratios. In the tab related to Figure 5, a copying error from our proteomics software caused the 6H time values to be incorrectly displayed. These values have also now been corrected. The values represented in the corrected version of Data S1 were the ones that had been used in our analyses throughout the paper, so the conclusions and figures in the paper remain unchanged. [O] . Tape, Christopher J., Ling, Stephanie, Dimitriadi, Maria, 2016

机译：错误：致癌KRas通过基质复制调节肿瘤细胞信号：我们的论文使用蛋白质组学方法证明了肿瘤和基质细胞中癌基因信号传导的细胞自主和非细胞自主作用。我们注意到数据s1总结了我们的蛋白质组学和磷酸化蛋白质组学数据，包括两组错误。在与图3E相关的标签中，数据被标记为表示log2转换的比率，但被错误地格式化以表示自然比率。现在已将这些数字更改为表示log2转换比率。在与图5相关的选项卡中，来自我们的蛋白质组学软件的复制错误导致6H时间值被错误地显示。这些值现在也已得到纠正。数据s1的更正版本中表示的值是我们在整篇论文中用于分析的值，因此论文中的结论和数字保持不变。
10. Bridging Gaps in Police Crime Data: A Discussion Paper from the BJS Fellows211 Program. A Discussion Paper from the BJS Fellows Program [R] . Maltz, M. D. 1999

机译：弥合警方犯罪数据的漏洞：来自BJs研究员211计划的讨论文件。 BJs研究员计划的讨论文件

Detecting and Explaining Exceptional Values in Categorical Data: DISCUSSION PAPER

摘要

著录项

相似文献

相关主题

期刊订阅