Using category-based adherence to cluster market-basket data

机译：使用基于类别的依从性来集群市场篮子数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We devise an efficient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality, sparsity, and with massive outliers. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise a measurement, called the category-based adherence, and utilize this measurement to perform the clustering. The distance of an item to a given cluster is defined as the number of links between this item and its nearest large node in the taxonomy tree where a large node is an item or a category node whose occurrence count exceeds a given threshold. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm CBA, for market-basket data with the objective to minimize the category-based adherence. A validation model based on information gain is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm CBA significantly outperforms the prior works in both the execution efficiency and the clustering quality for market-basket data.

机译：我们设计了一种用于集群市场篮子数据的高效算法。与传统数据的不同，市场篮子数据的特征是众所周知的，具有高维度，稀疏性和大量异常值。未经明确考虑分类学的存在，大多数情况下都可以将市场篮子数据的努力视为处理分类树的叶片水平的物品。不同级别的分类级别的聚类交易对于营销策略以及市场篮子数据集群技术的结果表示非常重要。鉴于市场篮子数据的特征，我们设计了一种测量，称为基于类别的遵守，并利用此测量来执行聚类。项目到给定集群的距离被定义为该项目与分类图中的最接近的大节点之间的链路数量，其中大节点是其出现计数超过给定阈值的项目或类别节点。然后将事务的基于类别的基于类别的遵守作为本类别的遵守测量的该类别的该事务中的项目的平均距离，我们开发了一个有效的聚类算法，称为市场篮子，称为算法CBA目的是最大限度地减少基于类别的依从性。还设计了一种基于信息增益的验证模型，以评估市场篮子数据的聚类质量。如真实和合成数据集的验证，我们的实验结果显示，随着分类信息，算法CBA显着优于现有工作，以便在市场篮下数据的执行效率和聚类质量。

著录项

来源
《》|2002年|p.546-553|共8页
会议地点
作者
Ching-Huang Yun; Kun-Ta Chuang; Ming-Syan Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
marketing; data mining; entropy; pattern clustering; category-based adherence; market-basket data; data clustering; transactions clustering; taxonomy; marketing strategies; category node;

机译：市场营销;数据挖掘;熵;模式聚类;基于类别的依从性;市场数据;数据聚类;交易聚类;分类法;营销策略;类别节点;

相似文献

外文文献
中文文献
专利

1. Adherence clustering: an efficient method for mining market-basket clusters [J] . Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Information Systems . 2006,第3期

机译：坚持聚类：挖掘市场篮子集群的一种有效方法
2. Market-Basket Analysis using Agglomerative Hierarchical approach for clustering a retail items [J] . Rujata Saraf, Sonal Patil International journal of computer science and network security . 2016,第3期

机译：使用聚集层次方法对零售项目进行聚类的市场篮子分析
3. A data-driven typology of asthma medication adherence using cluster analysis [J] . Holly Tibble, Amy Chan, Edwin A. Mitchell, Scientific reports. . 2020,第1期

机译：使用聚类分析的哮喘药物依从性的数据驱动的类型学
4. Using category-based adherence to cluster market-basket data [C] . Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen IEEE International Conference on Data Mining . 2002

机译：使用基于类别的遵守集群市场篮子数据
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. The effect of mobile phone text message reminders on health workers’ adherence to case management guidelines for malaria and other diseases in Malawi: lessons from qualitative data from a cluster-randomized trial [O] . Blessings N. Kaunda-Khangamwa, Laura C. Steinhardt, Alexander K. Rowe, 2018

机译：手机短信提醒对医务人员遵守马拉维疟疾和其他疾病的病例管理指南的影响：一项来自集群随机试验的定性数据的教训
7. A Scalable Approach to Balanced, High-dimensional Clustering of Market-baskets [O] . Alexander Strehl, Er Strehl, Joydeep Ghosh 2000

机译：一种可扩展的方法来平衡市场篮子的高维聚类

Using category-based adherence to cluster market-basket data

摘要

著录项

相似文献

相关主题

期刊订阅