首页> 外文期刊>Knowledge-Based Systems >Efficient colossal pattern mining in high dimensional datasets
【24h】

Efficient colossal pattern mining in high dimensional datasets

机译:高维数据集中的高效巨大模式挖掘

获取原文
获取原文并翻译 | 示例

摘要

'Frequent pattern mining' is considered as an important data mining problem which has been extensively studied over the last decade. There are a large number of algorithms which have been developed for frequent pattern mining on a traditional commercial dataset which usually contains a huge number of transactions besides a small number of items in each transaction. The advent of bioinformatics contributed to the development of new form of datasets - called high dimensional - which are characterized by small number of transactions and large number of items in each transaction. The running time of traditional algorithms increases exponentially with increasing average transaction length, thus these algorithms cannot be suitable for the high dimensional datasets. On the other hand, the mining algorithms on high dimensional datasets create a very large output set as result which includes small and mid-size frequent patterns which do not bear any useful information for scientists. Colossal pattern mining is described as a solution to reduce the amount of output set of mining patterns. Due to ignoring the mining of the small and mid-sized patterns, mining process speed is increased in colossal patterns mining algorithms. Therefore, only very large (colossal) patterns are extracted and mined in this approach. In this paper we represent an efficient vertical bottom up method to conduct mining of frequent colossal patterns in high dimensional datasets. In our algorithm, we use a bit matrix to compress the dataset and make it easy to use in mining process. Our experimental result shows that our algorithm attains very good mining efficiencies on various input datasets. Furthermore, our performance study shows that this algorithm outperforms substantially the best former algorithms.
机译:“频繁模式挖掘”被认为是一个重要的数据挖掘问题,在过去十年中已进行了广泛研究。已经开发了许多算法,用于在传统的商业数据集上进行频繁的模式挖掘,该算法通常包含大量交易,并且每笔交易中的项目数量很少。生物信息学的出现促成了新形式的数据集的发展,即高维数据集,其特点是交易数量少,每次交易中有大量物品。传统算法的运行时间随着平均事务长度的增加而呈指数增长,因此这些算法无法适用于高维数据集。另一方面,高维数据集上的挖掘算法会产生非常大的输出集,其结果包括中小型的频繁模式,这些模式对科学家没有任何有用的信息。巨大模式挖掘被描述为减少挖掘模式的输出集数量的解决方案。由于忽略了中小型模式的挖掘,因此在巨大模式挖掘算法中,挖掘过程的速度得以提高。因此,在这种方法中仅提取和挖掘非常大(巨大)的模式。在本文中,我们代表了一种有效的垂直自下而上的方法,用于在高维数据集中进行频繁的巨大模式的挖掘。在我们的算法中,我们使用位矩阵来压缩数据集,并使其易于在挖掘过程中使用。我们的实验结果表明,我们的算法在各种输入数据集上都具有很好的挖掘效率。此外,我们的性能研究表明,该算法明显优于以前最好的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号