Locality sensitive hashing for sampling-based algorithms in association rule mining

Chyouhwa Chen; Shi-Jinn Horng; Chin-Pin Huang

首页> 外文期刊>Expert Systems with Application >Locality sensitive hashing for sampling-based algorithms in association rule mining

【24h】

Locality sensitive hashing for sampling-based algorithms in association rule mining

机译：关联规则挖掘中基于采样的算法的局部敏感哈希

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Association rule mining is one of the most important techniques for intelligent system design and has been widely applied in a large number of real applications. However, classical mining algorithms cannot process very large databases in a reasonable amount of time. The sampling approach that processes a subset of the whole database is a viable alternative. Obviously, such an approach cannot extract perfectly accurate rules. Previous works have tried to improve the accuracy by removing "outliers" from the initial sample based on global statistical properties in the sample. In this paper, we take the view that the initial sample may actually consist of multiple possibly overlapping subsets or clusters. It is more reasonable to apply data clustering techniques to the initial sample before outlier removal is performed on the resulting clusters, so that outliers are removed based on local properties of individual clusters. However, clustering transactional data with very high dimensions is a difficult problem by itself. We solve this problem by interpreting locality sensitive hashing as a means for data clustering. Previously proposed algorithms may be then optionally used to remove the outliers in the individual clusters. We propose several con crete algorithms based on this general strategy. Using an extensive set of synthetic data and real datasets, we evaluate our proposed algorithms and find that our proposals exhibit better accuracy or execution time, or both, than previously proposed algorithms.

机译：关联规则挖掘是智能系统设计中最重要的技术之一，已被广泛应用于大量实际应用中。但是，传统的挖掘算法无法在合理的时间内处理非常大的数据库。处理整个数据库子集的采样方法是可行的选择。显然，这种方法无法提取出完全准确的规则。以前的工作试图通过基于样本中的全局统计属性从初始样本中删除“异常值”来提高准确性。在本文中，我们认为初始样本实际上可能由多个可能重叠的子集或群集组成。在对结果集群进行离群值去除之前，将数据聚类技术应用于初始样本更为合理，以便根据单个集群的局部属性去除离群值。但是，将具有非常高维度的交易数据聚类本身就是一个难题。我们通过将位置敏感的哈希解释为数据聚类的一种方法来解决此问题。然后可以可选地使用先前提出的算法来去除各个群集中的异常值。我们基于这种通用策略提出了几种具体算法。通过使用大量的合成数据和真实数据集，我们评估了我们提出的算法，发现我们的提议比以前提出的算法展现出更好的准确性或执行时间，或两者兼具。

著录项

来源
《Expert Systems with Application》 |2011年第10期|p.12388-12397|共10页
作者
Chyouhwa Chen; Shi-Jinn Horng; Chin-Pin Huang;
展开▼
作者单位

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 43, Keelung Road, Section 4, Taipei 10607, Taiwan, ROC;

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 43, Keelung Road, Section 4, Taipei 10607, Taiwan, ROC;

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 43, Keelung Road, Section 4, Taipei 10607, Taiwan, ROC;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
association rule mining; sampling; locality-sensitive hashing; clustering; outlier removal;

机译：关联规则挖掘;采样;局部敏感哈希集群离群值去除;

相似文献

外文文献
中文文献
专利

1. Fast probabilistic collision checking for sampling-based motion planning using locality-sensitive hashing [J] . Jia Pan, Dinesh Manocha The International journal of robotics research . 2016,第12期

机译：基于局部敏感哈希的基于采样的运动计划快速概率冲突检查
2. A New Perfect Hashing and Pruning Algorithm for Mining Association Rule [J] . Hassan Najadat, Amani Shatnawi, Ghadeer Obiedat IBIMA Communications . 2011,第7期

机译：一种新的完善的关联规则散列和修剪算法
3. New Perfect Hashing and Pruning Algorithm for Mining Association Rule [J] . Hassan Najadat, Amani Shatnawi, Ghadeer Obiedat IBIMA Communications . 2011,第7期

机译：挖掘关联规则的新的完美哈希和修剪算法
4. An Incremental Mining Algorithm for Association Rules Based on Minimal Perfect Hashing and Pruning [C] . Chuang-Kai Chiou, Judy C.R. Tseng Web technologies and applications. . 2012

机译：基于最小完美哈希和修剪的关联规则增量挖掘算法
5. Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach. [D] . Buaba, Ruben. 2012

机译：近似最近邻居搜索的快速局部敏感哈希算法：一种实用的数据挖掘方法。
6. Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism [O] . Ya-Han Hu, Yen-Liang Chen -1

机译：具有多个最小支持的挖掘关联规则：新的挖掘算法和支持调整机制
7. An Optimization of Hashing Mechanism for the DHP Association Rules Mining Algorithm [O] . Hyung-Bong Lee, Ki-Hyeon Kwon 2010

机译：DHP关联规则挖掘算法散列机制的优化
8. Kernelized Locality-Sensitive Hashing for Fast Image Landmark Association [R] . Weems, M. A. 2011

机译：用于快速图像地标关联的核心局部敏感哈希

Locality sensitive hashing for sampling-based algorithms in association rule mining

摘要

著录项

相似文献

相关主题

期刊订阅