【24h】

DBDC: Density Based Distributed Clustering

机译:DBDC:基于密度的分布式聚类

获取原文

摘要

Clustering has become an increasingly important task in modem application domains such as marketing and purchasing assistance, multimedia, molecular biology as well as many others. In most of these ureas, the dala are originally collected at different sites. In order to extract information from these dala, they are merged at a central site and then clustered. In this paper, we propose a different approach. We cluster the data locally and extract suitable representatives from these clusters. These representatives are sent to a global server site where we restore the complete cluster-ing based on the local representatives. This approach is very efficient, because the local clustering can be carried out quickly and independently i'rom each other. Furthermore, we have low transmission cost, as the number of transmitted representatives is much smaller than the cardinality of the complete dala set. Based on this small number of representatives, the global clustering can be done very efficiently. For both the local and the global clustering, we use a density based clustering algorithm. The combination of both the local and the global clustering forms our new DBDC (Density Based Distributed Clustering) algorithm. Furthermore, we discuss the complex problem of finding a suitable quality measure for evaluating distributed clusterings. We introduce two quality criteria which are compared to each other and which allow us to evaluate the quality of our DBDC algorithm. In our experimental evaluation, we will show that we do not have to sacrifice clustering quality in order to gain an efficiency advantage when using our distributed clustering approach.
机译:群集已成为Modem应用领域的越来越重要的任务,例如营销和购买援助,多媒体,分子生物学以及其他许多人。在大多数这些植物中,大巴最初在不同的地点收集。为了从这些DALA中提取信息,它们在中央站点合并,然后群集。在本文中,我们提出了一种不同的方法。我们在本地聚集数据,并从这些集群中提取合适的代表。这些代表被发送到全局服务器站点,在那里我们基于本地代表恢复完整的群集。这种方法非常有效,因为本地聚类可以互相快速而独立地进行。此外,由于传输代表的数量远小于完整的DALA集的基数,我们具有较低的传输成本。基于这一少数代表,全局聚类可以非常有效地完成。对于本地和全局聚类,我们使用基于密度的聚类算法。本地和全局聚类的组合形成了我们的新DBDC(基于密度的分布式聚类)算法。此外,我们讨论了寻找评估分布式群集的合适质量措施的复杂问题。我们介绍了两个质量标准,彼此相比,这使我们能够评估我们的DBDC算法的质量。在我们的实验评估中,我们将表明我们不必牺牲聚类质量,以便在使用我们分布式聚类方法时获得效率优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号