【24h】

A Generative Pattern Model for Mining Binary Datasets

机译:挖掘二进制数据集的生成模式模型

获取原文
获取外文期刊封面目录资料

摘要

In many application fields, huge binary datasets modeling real life-phenomena are daily produced. These datasets record observations of some events, and people are often interested in mining them in order to recognize recurrent patterns. However, the discovery of the most important patterns is very challenging. For example, these patterns may overlap, or be related only to a particular subset of the observations. Finally, the mining can be hindered by the presence of noise. In this paper, we introduce a generative pattern model, and an associated cost model for evaluating the goodness of the set of patterns extracted from a binary dataset. We propose an efficient algorithm, named GPM, for the discovery of the most relevant patterns according to the model. We show that the proposed model generalizes other approaches and supports the discovery of high quality patterns.
机译:在许多应用领域中,每天都会产生大量模拟现实生活现象的二进制数据集。这些数据集记录了某些事件的观察结果,人们经常对挖掘它们感兴趣,以便识别重复发生的模式。但是,发现最重要的模式非常具有挑战性。例如,这些模式可能重叠,或仅与观察值的特定子集有关。最后,噪音的存在会阻碍采矿。在本文中,我们介绍了一个生成模式模型,以及一个相关的成本模型,用于评估从二进制数据集中提取的模式集的优劣。我们提出了一种有效的算法,称为GPM,用于根据模型发现最相关的模式。我们表明,提出的模型推广了其他方法并支持了高质量模式的发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号