首页> 外文期刊>International Journal of Applied Engineering Research >Study of Association Rule Mining for Discovery of Frequent Item Sets on Big Data Sets
【24h】

Study of Association Rule Mining for Discovery of Frequent Item Sets on Big Data Sets

机译:关于大数据集频繁项目集发现的关联规则挖掘研究

获取原文
获取原文并翻译 | 示例
       

摘要

Frequent pattern mining is an significant research area in data mining. Since its introduction, it has drawn consideration of many researchers. Data generation and collection diagonally all areas enhance in size exponentially. Knowledge discovery and decision making requires the capability to process and extract imminent from "Big" Data in a scalable and efficient manner. Data mining is the application of sophisticated analysis to large amounts of data in order to discover new knowledge in the form of patterns, trends, and associations. With the beginning of the World Wide Web, the quantity of data stored and available by electronic means has developed extremely and the process of knowledge discovery (data mining) from this data has become very central for the business and scientific research communities. Many algorithms have been projected to mine frequent Item sets. Well-liked algorithms include level-wise Apriori based algorithms, tree based algorithms, and hyperlinked array structure based algorithms. While these algorithms are accepted and useful due to some good possessions, they also experience from some problems such as multiple database scans, recursive tree constructions, or multiple hyperlink adjustments. In the current era of big data, high volumes of a wide variety of valuable data of different veracities can be easily collected or generated at high velocity in a range of real-life applications. The problem of mining frequent queries in a relational database defined over a star schema is not easy even when we deal with only one table, because, on the one hand, the size of the search space is huge (because encompassing all possible queries that can be addressed to a given database), and on the other hand, testing whether two queries are equivalent (which entails redundant support computations) is NP-Complete. Therefore, the problem is even more difficult when they are applied to Big Data. In this paper we focus on handling high volumes of big data to generate frequent Item sets. In this paper, we examine in details the problems related to the Frequent Pattern Mining (FPM) in distributed and large data sets and present a general framework for adapting an exact Frequent Pattern Mining algorithm.
机译:频繁的模式挖掘是数据挖掘的重要研究区域。自引言以来,它已经考虑了许多研究人员。数据生成和集合对角线所有区域呈指数增强。知识发现和决策需要能够以可扩展和有效的方式从“大”数据迫在眉睫。数据挖掘是将复杂分析应用于大量数据,以发现模式,趋势和关联形式的新知识。随着万维网的开头,电子手段存储和可提供的数据数量非常且来自该数据的知识发现(数据挖掘)的过程已经成为商业和科学研究社区的核心。许多算法已经投影到频繁的项目集。非常喜欢的算法包括基于级别的基于Apriori的算法,基于树的算法和基于超链接阵列结构的算法。虽然由于一些良好的财产被接受和有用,但它们也从多个数据库扫描,递归树结构或多个超链接调整等一些问题经历。在大数据的当前时代,可以在一系列现实寿命应用中轻松收集或产生各种不同可靠性的各种有价值数据的高卷。在星形模式上定义的关系数据库中频繁查询的问题并不容易,即使我们处理一个表,因为一方面,搜索空间的大小是巨大的(因为包含所有可能的查询在给定数据库中解决),另一方面,测试两个查询是否等同(这需要冗余支持计算)是NP-Tress。因此,当它们应用于大数据时,问题更加困难。在本文中,我们专注于处理大量的大数据以产生频繁的项目集。在本文中,我们在分布式和大数据集中进行了与频繁模式挖掘(FPM)相关的问题,并提供了一种适应精确频繁模式挖掘算法的一般框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号