【24h】

Significant Pattern Mining: Efficient Algorithms and Biomedical Applications

机译:重要的模式挖掘:高效算法和生物医学应用

获取原文

摘要

Pattern mining techniques such as itemset mining, sequence mining and graph mining have been applied to a wide range of datasets. To convince biomedical researchers, however, it is necessary to show statistical significance of obtained patterns to prove that the patterns are not likely to emerge from random data. The key concept of significance testing is family-wise error rate, i.e., the probability of at least one pattern is falsely discovered under null hypotheses. In the worst case, FWER grows linearly to the number of all possible patterns. We show that, in reality, FWER grows much slower than the worst case, and it is possible to find significant patterns in biomedical data. The following two properties are exploited to accurately bound FWER and compute small p-value correction factors. (1) Only closed patterns need to be counted. (2) Patterns of low support can be ignored, where the support threshold depends on the Tarone bound. We introduce efficient depth-first search algorithms for discovering all significant patterns and discuss about parallel implementations.
机译:模式挖掘技术(例如项集挖掘,序列挖掘和图形挖掘)已应用于广泛的数据集。然而,要说服生物医学研究人员,有必要证明所获得模式的统计意义,以证明该模式不太可能从随机数据中出现。重要性检验的关键概念是针对家庭的错误率,即在原假设下错误地发现至少一种模式的可能性。在最坏的情况下,FWER线性增长到所有可能模式的数量。我们表明,实际上,FWER的增长速度比最坏的情况要慢得多,并且有可能在生物医学数据中找到重要的模式。利用以下两个属性来精确绑定FWER并计算小的p值校正因子。 (1)仅需要计算闭合模式。 (2)低支撑模式可以忽略,其中支撑阈值取决于Tarone界限。我们介绍了用于发现所有重要模式的高效深度优先搜索算法,并讨论了并行实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号