【24h】

DBDC: Density Based Distributed Clustering

机译:DBDC:基于密度的分布式集群

获取原文
获取原文并翻译 | 示例

摘要

Clustering has become an increasingly important task in modern application domains such as marketing and purchasing assistance, multimedia, molecular biology as well as many others. In most of these areas, the data are originally collected at different sites. In order to extract information from these data, they are merged at a central site and then clustered. In this paper, we propose a different approach. We cluster the data locally and extract suitable representatives from these clusters. These representatives are sent to a global server site where we restore the complete clustering based on the local representatives. This approach is very efficient, because the local clustering can be carried out quickly and independently from each other. Furthermore, we have low transmission cost, as the number of transmitted representatives is much smaller than the cardinality of the complete data set. Based on this small number of representatives, the global clustering can be done very efficiently. For both the local and the global clustering, we use a density based clustering algorithm. The combination of both the local and the global clustering forms our new DBDC (Density Based Distributed Clustering) algorithm. Furthermore, we discuss the complex problem of finding a suitable quality measure for evaluating distributed clusterings. We introduce two quality criteria which are compared to each other and which allow us to evaluate the quality of our DBDC algorithm. In our experimental evaluation, we will show that we do not have to sacrifice clustering quality in order to gain an efficiency advantage when using our distributed clustering approach.
机译:集群已成为现代应用程序领域中越来越重要的任务,例如市场营销和购买协助,多媒体,分子生物学以及许多其他领域。在大多数这些区域中,数据最初是在不同站点收集的。为了从这些数据中提取信息,将它们在中央站点合并,然后进行群集。在本文中,我们提出了一种不同的方法。我们在本地对数据进行聚类,并从这些聚类中提取合适的代表。这些代表将被发送到全球服务器站点,在该站点中,我们将根据本地代表还原完整的群集。这种方法非常有效,因为可以快速且彼此独立地进行本地聚类。此外,由于传输代表的数量远小于完整数据集的基数,因此我们具有较低的传输成本。基于少量代表,可以非常有效地完成全局聚类。对于局部和全局聚类,我们使用基于密度的聚类算法。本地和全局群集的结合形成了我们新的DBDC(基于密度的分布式群集)算法。此外,我们讨论了寻找合适的质量度量以评估分布式聚类的复杂问题。我们介绍了两个相互比较的质量标准,它们使我们能够评估DBDC算法的质量。在我们的实验评估中,我们将表明,使用分布式聚类方法时,不必为了获得效率优势而牺牲聚类质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号