首页> 外文会议>International Conference on Data Warehousing and Knowledge Discovery >Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data
【24h】

Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data

机译:自我调整聚类:交易数据的自适应聚类方法

获取原文

摘要

In this paper, we devise an efficient algorithm for clustering market-basket data items. Market-basket data analysis has been well addressed in mining association rules for discovering the set of large items which are the frequently purchased items among all transactions. In essence, clustering is meant to divide a set of data items into some proper groups inn such a way that items in the same group are as similar to one another as possible. In view of the nature of clustering market basket data, we present a measurement, called the small-large (SL) ratio, which is in essence the ratio of the number of small items to that of large items. Clearly, the smaller the SL ratio of a cluster, the more similar to one another the items in the cluster are. Then, by utilizing a self-tuning technique for adaptively tuning the input and output SL ratio thresholds, we develop an efficient clustering algorithm, algorithm STC (standing for Self-Tuning Clustering), for clustering market-basket data. The objective of algorithm STC is "Given a database of transactions, determine a clustering such that the average SL ratio is minimized." We conduct several experiments on the data and the synthetic workload for performance studies. It is shown by our experimental results that by utilizing the self-tuning technique to adaptively minimize the input and output SL ratio thresholds, algorithm STC performs very well. Specifically, algorithm STC not only incurs an execution time that is significantly smaller than that by prior works but also leads to the clustering results of very good quality.
机译:在本文中,我们设计了聚类市场篮数据项的有效算法。市场篮数据分析得到了很好的挖掘关联规则发现的一组大项目里面全是交易中经常购买的物品处理。在本质上,聚类是指一组数据项的划分成一些适当的基团客栈这样的方式在同一个组的项目是作为彼此相似越好。鉴于聚类市场购物篮的数据的性质的,提出了一种测量,称为小大(SL)的比例,这在本质上的小件物品到该大项的数目的比率。显然,较小的簇的SL比,更彼此相似集群中的项目。然后,通过利用自调谐技术,用于自适应地调整输入和输出SL比值的阈值,我们开发一种有效的聚类算法,算法STC(静置自校正聚类),用于聚类市场购物篮的数据。算法STC的目的是“给定交易的数据库,确定一个聚类,使得平均比率SL被最小化”。我们的数据和性能研究合成的工作量进行多次试验。它是由我们的实验结果表明,通过利用自调谐技术来最小化自适应地对输入和输出SL比值的阈值,算法STC进行得非常好。具体而言,算法STC不仅招致的执行时间是由之前的作品,但也导致了质量很好的聚类结果比显著小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号