Adherence clustering: an efficient method for mining market-basket clusters

Ching-Huang Yun; Kun-Ta Chuang; Ming-Syan Chen

首页> 外文期刊>Information Systems >Adherence clustering: an efficient method for mining market-basket clusters

【24h】

Adherence clustering: an efficient method for mining market-basket clusters

机译：坚持聚类：挖掘市场篮子集群的一种有效方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We explore in this paper the efficient clustering of market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality and sparsity. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm k-todes, for market-basket data with the objective to minimize the category-based adherence. The distance of an item to a given cluster is defined as the number of links between this item and its nearest tode. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster. A validation model based on information gain is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm k-todes devised in this paper significantly outperforms the prior works in both the execution efficiency and the clustering quality as measured by information gain, indicating the usefulness of category-based adherence in market-basket data clustering.

机译：我们在本文中探索了市场篮子数据的有效聚类。与传统数据不同，市场购物数据的特征具有高维度和稀疏性。在没有明确考虑分类法存在的情况下，大多数有关对市场篮子数据进行聚类的先前工作都可以视为处理分类法树的叶级别中的项目。跨不同分类法对交易进行聚类对于营销策略以及市场篮子数据聚类技术的结果表示非常重要。鉴于市场购物数据的特征，我们在本文中设计了一种新颖的衡量方法，称为基于类别的依从性，并利用此衡量方法进行聚类。通过这种基于类别的依从性度量，我们针对市场购物数据开发了一种有效的聚类算法，称为算法k-todes，目的是最大程度地减少基于类别的依从性。项目到给定簇的距离定义为该项目与其最接近的点之间的链接数。然后，将基于类别的事务对群集的依从性定义为该事务中的项目到该群集的平均距离。还设计了一种基于信息获取的验证模型，以评估市场篮子数据的聚类质量。如实数据集和合成数据集所验证的，我们的实验结果表明，利用分类信息，本文设计的算法k-todes在执行效率和以信息增益衡量的聚类质量方面均明显优于先前的工作，表明在购物篮数据聚类中基于类别的遵从性的有用性。

著录项

来源
《Information Systems》 |2006年第3期|p.170-186|共17页
作者
Ching-Huang Yun; Kun-Ta Chuang; Ming-Syan Chen;
展开▼
作者单位

Department of Electrical Engineering, National Taiwan University, No. 1, Sec. 14. Roosevelt Rd., Taipei, Taiwan, ROC;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
data mining; clustering market-basket data; category-based adherence; k-todes;

机译：数据挖掘;聚集市场数据;基于类别的依从性;k-todes;
入库时间 2022-08-18 02:48:08

相似文献

外文文献
中文文献
专利

1. Analyzing and Optimizing ANT-Clustering Algorithm by Using Numerical Methods for Efficient Data Mining [J] . Md. Asikur Rahman, Md. Mustafizur Rahman, Md. Mustafa Kamal Bhuiyan, International Journal of Data Mining & Knowledge Management Process . 2012,第5期

机译：数值方法分析和优化蚁群算法以进行高效数据挖掘
2. An efficient pixel clustering-based method for mining spatial sequential patterns from serial remote sensing images [J] . Wu Xiaozhu, Zhang Ximei Computers & geosciences . 2019,第MARa期

机译：基于有效像素聚类的串行遥感影像空间序列模式挖掘方法
3. Efficiently mining gene expression data via a novel parameterless clustering method [J] . Tseng V.S., Ching-Pin Kao IEEE/ACM transactions on computational biology and bioinformatics . 2005,第4期

机译：通过新型无参数聚类方法有效地挖掘基因表达数据
4. Using category-based adherence to cluster market-basket data [C] . Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen . 2002

机译：使用基于类别的依从性来集群市场篮子数据
5. Efficient genetic k-means clustering algorithm and its application to data mining on different domains. [D] . Alsayat, Ahmed Mosa. 2016

机译：高效的遗传k均值聚类算法及其在不同领域数据挖掘中的应用。
6. Efficient Mining of Discriminative Co-clusters from Gene Expression Data [O] . Omar Odibat, Chandan K. Reddy -1

机译：从基因表达数据有效挖掘区分性共群
7. Analyzing and Optimizing ANT-Clustering Algorithm by Using Numerical Methods for Efficient Data Mining [O] . Md. Asikur Rahman 2012

机译：利用高效数据挖掘数值方法分析和优化蚁群聚类算法

Adherence clustering: an efficient method for mining market-basket clusters

摘要

著录项

相似文献

相关主题

期刊订阅