首页> 外文期刊>Information Systems >Adherence clustering: an efficient method for mining market-basket clusters
【24h】

Adherence clustering: an efficient method for mining market-basket clusters

机译:坚持聚类:挖掘市场篮子集群的一种有效方法

获取原文
获取原文并翻译 | 示例
       

摘要

We explore in this paper the efficient clustering of market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality and sparsity. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm k-todes, for market-basket data with the objective to minimize the category-based adherence. The distance of an item to a given cluster is defined as the number of links between this item and its nearest tode. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster. A validation model based on information gain is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm k-todes devised in this paper significantly outperforms the prior works in both the execution efficiency and the clustering quality as measured by information gain, indicating the usefulness of category-based adherence in market-basket data clustering.
机译:我们在本文中探索了市场篮子数据的有效聚类。与传统数据不同,市场购物数据的特征具有高维度和稀疏性。在没有明确考虑分类法存在的情况下,大多数有关对市场篮子数据进行聚类的先前工作都可以视为处理分类法树的叶级别中的项目。跨不同分类法对交易进行聚类对于营销策略以及市场篮子数据聚类技术的结果表示非常重要。鉴于市场购物数据的特征,我们在本文中设计了一种新颖的衡量方法,称为基于类别的依从性,并利用此衡量方法进行聚类。通过这种基于类别的依从性度量,我们针对市场购物数据开发了一种有效的聚类算法,称为算法k-todes,目的是最大程度地减少基于类别的依从性。项目到给定簇的距离定义为该项目与其最接近的点之间的链接数。然后,将基于类别的事务对群集的依从性定义为该事务中的项目到该群集的平均距离。还设计了一种基于信息获取的验证模型,以评估市场篮子数据的聚类质量。如实数据集和合成数据集所验证的,我们的实验结果表明,利用分类信息,本文设计的算法k-todes在执行效率和以信息增益衡量的聚类质量方面均明显优于先前的工作,表明在购物篮数据聚类中基于类别的遵从性的有用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号