Estimating missed actual positives using independent classifiers

机译：使用独立分类器估算错过的实际正数

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data mining is increasingly being applied in environments having very high rate of data generation like network intrusion detection [7], where routers generate about 300,000 -- 500,000 connections every minute. In such rare class data domains, the cost of missing a rare-class instance is much higher than that of other classes. However, the high cost for manual labeling of instances, the high rate at which data is collected as well as real-time response constraints do not always allow one to determine the actual classes for the collected unlabeled datasets. In our previous work [9], this problem of missed false negatives was explained in context of two different domains -- "network intrusion detection" and "business opportunity classification". In such cases, an estimate for the number of such missed high-cost, rare instances will aid in the evaluation of the performance of the modeling technique (e.g. classification) used. A capture-recapture method was used for estimating false negatives, using two or more learning methods (i.e. classifiers). This paper focuses on the dependence between the class labels assigned by such learners. We define the conditional independence for classifiers given a class label and show its relation to the conditional independence of the features sets (used by the classifiers) given a class label. The later is a computationally expensive problem and hence, a heuristic algorithm is proposed for obtaining conditionally independent (or less dependent) feature sets for the classifiers. Initial results of this algorithm on synthetic datasets are promising and further research is being pursued.

机译：数据挖掘正越来越多地用于数据生成速率很高的环境中，例如网络入侵检测[7]，在该环境中，路由器每分钟生成约300,000-500,000个连接。在这样的稀有类数据域中，丢失稀有类实例的代价要比其他类高得多。但是，手动标记实例的高成本，数据的高收集率以及实时响应约束并不总是使人们能够确定所收集的未标记数据集的实际类别。在我们以前的工作中[9]，在两个不同的领域（“网络入侵检测”和“商机分类”）中解释了漏掉漏报的问题。在这种情况下，对这种错过的高成本，稀有实例的数量进行估计将有助于评估所使用的建模技术（例如分类）的性能。使用捕获-重新捕获方法来估计误报，它使用两种或多种学习方法（即分类器）。本文着重于此类学习者分配的班级标签之间的依存关系。我们为给定类别标签的分类器定义条件独立性，并在给定类别标签的情况下显示其与要素集（由分类器使用）的条件独立性的关系。后者是计算上昂贵的问题，因此，提出了一种启发式算法，用于获得分类器的条件独立（或依赖性较小）的特征集。该算法在合成数据集上的初步结果令人鼓舞，并且正在进一步研究中。

著录项

来源
《ACM SIGKDD international conference on Knowledge discovery in data mining》|2005年|P.648-653|共6页
会议地点
作者
Sandeep Mane; Jaideep Srivastava; San-Yih Hwang; PSandeep Mane; PJaideep Srivastava; PSan-Yih Hwang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
false negative;

机译：假阴性;

相似文献

外文文献
中文文献
专利

1. The estimated prevalence of missed positive lymph nodes based on extent of lymphadenectomy at radical prostatectomy [J] . Chakiryan Nicholas H., Acevedo Ann Martinez, Conlin Michael J., Urologic oncology . 2019,第9期

机译：基于根治性前列腺切除术的淋巴结切除术的错失阳性淋巴结估计患病率
2. 11 Lymph node ratio is an independent risk classifier in node positive breast cancer patients: results of the phase III BIG 02-98 trial [J] . O.Metzger, E.de Azambuja, E.Quinaux, European Journal of Cancer Supplements . 2010,第3期

机译：11淋巴结比率是淋巴结阳性乳腺癌患者的独立风险分类：III BIG 02-98期试验的结果
3. Estimating clinical outcomes and classifying CFTR variants of unknown significance in children with a positive newborn screening for Cystic Fibrosis [J] . Conti David V., Azen Colleen, Thomas Duncan C., Genetic epidemiology. . 2015,第7期

机译：评估囊性纤维化新生儿筛查阳性的儿童的临床结局和未知意义的CFTR变异分类
4. Estimating Missed Actual Positives Using Independent Classifiers [C] . Sandeep Mane, Jaideep Srivastaya, San-Yin Hwang Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'05); 20050821-24; Chicago,IL(US) . 2005

机译：使用独立分类器估计实际漏失率
5. Comparing the accuracy of a photograph to actual skull morphology to classify contemporary U.S. populations into biological affinities, for forensic purposes. [D] . Lomas, Lisa. 2012

机译：将照片的准确性与实际的头骨形态进行比较，以将美国当代人群分类为生物亲和力，以进行法医学鉴定。
6. Injury prevention: Individual factors affecting adult recreational snowboarders’ actual and estimated speeds on regular slopes [O] . Luis Carus, Isabel Castillo 2021

机译：伤害预防：影响成年人休闲滑雪板的个人因素在普通斜坡上的实际和估计速度
7. Analyzing dependent data as if independent biases effect size estimates and increases the risk of false-positive findings [O] . Martin E. Héroux 2021

机译：分析依赖数据，因为如果独立偏置效应大小估计并增加假阳性发现的风险
8. Independent Converging FMS/LNAV Missed Approach Evaluation [R] . McCartor, G. R., Hasman, F., Jones, A., 1997

机译：独立收敛Fms / LNaV错过方法评估

Estimating missed actual positives using independent classifiers

摘要

著录项

相似文献

相关主题

期刊订阅