首页> 外文会议>Data Warehousing and Knowledge Discovery >Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data
【24h】

Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data

机译:自调整聚类:事务数据的自适应聚类方法

获取原文

摘要

In this paper, we devise an efficient algorithm for clustering market-basket data items. Market-basket data analysis has been well addressed in mining association rules for discovering the set of large items which are the frequently purchased items among all transactions. In essence, clustering is meant to divide a set of data items into some proper groups in such a way that items in the same group are as similar to one another as possible. In view of the nature of clustering market basket data, we present a measurement, called the small-large (SL) ratio, which is in essence the ratio of the number of small items to that of large items. Cleatly, the smaller the SL tatio of a cluster, the more similar to one another the items in the cluster are. Then, by utilizing a self-tuning technique for adaptively tuning the input and output SL ratio thresholds, we develop an efficient clustering algorithm, algorithm STC (standing for Self-Tuning Clustering), for clustering market-basket data. The objective of algorithm STC is "Given a database of transactions, determine a clustering such that the average SL ratio is minimized." We conduct several experiments on the real data and the synthetic workload for performance studies. It is shown by our experimental results that by utilizing the self-tuning technique to adaptively minimize the input and output SL ratio thresholds, algorithm STC performs very well. Specifically, algorithm STC not only incurs an execution time that is significantly smaller than that by prior works but also leads to the clustering results of very good quality.
机译:在本文中,我们设计了一种用于对市场购物数据项进行聚类的有效算法。挖掘关联规则已很好地解决了市场购物数据分析问题,该规则可用于发现大型商品的集合,这些大型商品是所有交易中经常购买的商品。本质上,聚类意味着将一组数据项划分为一些适当的组,以使同一组中的项尽可能彼此相似。鉴于对市场篮子数据进行聚类的性质,我们提出了一种度量,称为小-大(SL)比率,从本质上讲,这是小项目数量与大项目数量之比。巧妙地,集群的SL比值越小,集群中的各项就越相似。然后,通过利用自调整技术自适应地调整输入和输出SL比率阈值,我们开发了一种有效的聚类算法,即算法STC(代表自调整聚类),用于对市场数据进行聚类。算法STC的目标是“给交易数据库,确定一个聚类,以使平均SL比率最小化。”我们对真实数据和综合工作量进行了一些实验,以进行性能研究。我们的实验结果表明,通过利用自整定技术来自适应地最小化输入和输出SL比阈值,STC算法的性能非常好。具体而言,算法STC不仅导致执行时间比以前的工作大得多,而且导致了非常高质量的聚类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号