Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data

机译：自调整聚类：事务数据的自适应聚类方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we devise an efficient algorithm for clustering market-basket data items. Market-basket data analysis has been well addressed in mining association rules for discovering the set of large items which are the frequently purchased items among all transactions. In essence, clustering is meant to divide a set of data items into some proper groups in such a way that items in the same group are as similar to one another as possible. In view of the nature of clustering market basket data, we present a measurement, called the small-large (SL) ratio, which is in essence the ratio of the number of small items to that of large items. Cleatly, the smaller the SL tatio of a cluster, the more similar to one another the items in the cluster are. Then, by utilizing a self-tuning technique for adaptively tuning the input and output SL ratio thresholds, we develop an efficient clustering algorithm, algorithm STC (standing for Self-Tuning Clustering), for clustering market-basket data. The objective of algorithm STC is "Given a database of transactions, determine a clustering such that the average SL ratio is minimized." We conduct several experiments on the real data and the synthetic workload for performance studies. It is shown by our experimental results that by utilizing the self-tuning technique to adaptively minimize the input and output SL ratio thresholds, algorithm STC performs very well. Specifically, algorithm STC not only incurs an execution time that is significantly smaller than that by prior works but also leads to the clustering results of very good quality.

机译：在本文中，我们设计了一种用于对市场购物数据项进行聚类的有效算法。挖掘关联规则已很好地解决了市场购物数据分析问题，该规则可用于发现大型商品的集合，这些大型商品是所有交易中经常购买的商品。本质上，聚类意味着将一组数据项划分为一些适当的组，以使同一组中的项尽可能彼此相似。鉴于对市场篮子数据进行聚类的性质，我们提出了一种度量，称为小-大（SL）比率，从本质上讲，这是小项目数量与大项目数量之比。巧妙地，集群的SL比值越小，集群中的各项就越相似。然后，通过利用自调整技术自适应地调整输入和输出SL比率阈值，我们开发了一种有效的聚类算法，即算法STC（代表自调整聚类），用于对市场数据进行聚类。算法STC的目标是“给交易数据库，确定一个聚类，以使平均SL比率最小化。”我们对真实数据和综合工作量进行了一些实验，以进行性能研究。我们的实验结果表明，通过利用自整定技术来自适应地最小化输入和输出SL比阈值，STC算法的性能非常好。具体而言，算法STC不仅导致执行时间比以前的工作大得多，而且导致了非常高质量的聚类结果。

著录项

来源
《Data Warehousing and Knowledge Discovery》|2002年|p.42-51|共10页
会议地点
作者
Ching-Huang Yun; Kun-Ta Chuang; Ming-Syan Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词
data mining; clustering market-basket data; small-large ratios; adaptive self-tuning;

机译：数据挖掘;聚集市场购物数据;小比率大;自适应自我调整;

相似文献

外文文献
中文文献
专利

1. DMCM: a Data-adaptive Mutation Clustering Method to identify cancer-related mutation clusters [J] . Lu Xinguo, Qian Xin, Li Xing, Bioinformatics . 2019,第3期

机译：DMCM：数据 - 自适应突变聚类方法，用于识别癌症相关的突变集群
2. A Data Cleansing Method for Clustering Large-Scale Transaction Databases [J] . Woong-Kee LOH, Yang-Sae MOON, Jun-Gyu KANG IEICE transactions on information and systems . 2010,第11期

机译：集群大型交易数据库的数据清理方法
3. A Data Cleansing Method for Clustering Large-Scale Transaction Databases [J] . Woong-Kee LOH, Yang-Sae MOON, Jun-Gyu KANG IEICE Transactions on Information and Systems . 2010,第11期

机译：集群大型交易数据库的数据清理方法
4. Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data [C] . Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen International Conference on Data Warehousing and Knowledge Discovery . 2002

机译：自我调整聚类：交易数据的自适应聚类方法
5. The catalog of galaxy clusters obtained by an adaptive matched filter method applied to the Sloan Digital Sky Survey Data Release Six. [D] . Szabo, Thad P. 2010

机译：通过适用于斯隆数字天空测量数据发布第六版的自适应匹配滤波方法获得的星系团目录。
6. Review of methods for handling confounding by cluster and informative cluster size in clustered data [O] . Shaun Seaman, Menelaos Pavlou, Andrew Copas -1

机译：综述了处理聚类数据中的聚类和信息性聚类大小的混淆方法
7. DMCM: a Data-adaptive Mutation Clustering Method to identify cancer-related mutation clusters [O] . Xinguo Lu, Xin Qian, Xing Li, 2018

机译：DMCM：数据 - 自适应突变聚类方法，用于识别癌症相关的突变集群

Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data

摘要

著录项

相似文献

相关主题

期刊订阅