首页> 外文会议>European conference on machine learning and knowledge discovery in databases >Statistical Hypothesis Testing in Positive Unlabelled Data
【24h】

Statistical Hypothesis Testing in Positive Unlabelled Data

机译:未标记阳性数据的统计假设检验

获取原文

摘要

We propose a set of novel methodologies which enable valid statistical hypothesis testing when we have only positive and unlabelled (PU) examples. This type of problem, a special case of semi-supervised data, is common in text mining, bioinformatics, and computer vision. Focusing on a generalised likelihood ratio test, we have 3 key contributions: (1) a proof that assuming all unlabelled examples are negative cases is sufficient for independence testing, but not for power analysis activities; (2) a new methodology that compensates this and enables power analysis, allowing sample size determination for observing an effect with a desired power; and finally, (3) a new capability, supervision determination, which can determine a-priori the number of labelled examples the user must collect before being able to observe a desired statistical effect. Beyond general hypothesis testing, we suggest the tools will additionally be useful for information theoretic feature selection, and Bayesian Network structure learning.
机译:我们提出了一套新颖的方法论,当我们只有阳性和未标记的(PU)实例时,它们可以进行有效的统计假设检验。这种类型的问题是半监督数据的一种特殊情况,在文本挖掘,生物信息学和计算机视觉中很常见。着眼于广义似然比检验,我们有3个主要贡献:(1)证明假设所有未标记的示例都是负面案例,足以进行独立性测试,但不足以进行功效分析活动; (2)一种新的方法,可以对此进行补偿,并能够进行功效分析,从而可以确定样本大小,以观察具有所需功效的效果;最后,(3)一种新的功能,监督确定,可以先验确定用户在能够观察到所需的统计效果之前必须收集的带标签示例的数量。除了一般的假设检验之外,我们建议这些工具还可以用于信息理论特征选择和贝叶斯网络结构学习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号