首页> 外文期刊>Data Mining and Knowledge Discovery >An integrated, generic approach to pattern mining: data mining template library
【24h】

An integrated, generic approach to pattern mining: data mining template library

机译:一种集成的,通用的模式挖掘方法:数据挖掘模板库

获取原文
获取原文并翻译 | 示例

摘要

Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software (http://dmtl.sourceforge.net/), and has been downloaded by numerous researchers from all over the world.
机译:频繁模式挖掘(FPM)是一种重要的数据挖掘范例,用于提取信息模式,例如项集,序列,树和图形。但是,尚未尝试集成FPM任务的实用框架。在本文中,我们描述了FPM的数据挖掘模板库(DMTL)的设计和实现。 DMTL使用通用的数据挖掘方法,其中挖掘的所有方面都通过一组属性进行控制。它使用新颖的模式属性层次结构来定义和挖掘不同的模式类型。可以将这种属性层次结构视为模式空间的系统表征,即元模式规范,允许分析人员通过扩展此层次结构来指定新的模式类型。此外,在DMTL中,采矿的所有方面都由一组不同的采矿属性控制。例如,要使用的挖掘方法的种类,要挖掘的数据类型和格式的种类,要使用的后端存储管理器的种类都指定为属性列表。这为为各种应用程序定制工具包提供了极大的灵活性。该工具包的灵活性体现在可以轻松添加对新模式的支持上。进行了综合数据集和公共数据集实验,以证明库中持久后端提供的可伸缩性。 DMTL已作为开源软件(http://dmtl.sourceforge.net/)公开发布,并已被来自世界各地的众多研究人员下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号