首页> 外文期刊>Computing and informatics >New Algorithm for Clustering Distributed Data Using k-Means
【24h】

New Algorithm for Clustering Distributed Data Using k-Means

机译:使用k均值的分布式数据聚类新算法

获取原文
           

摘要

The internet era and high speed networks have ushered in the capabilities to have ready access to large amounts of geographically distributed data. Individuals, businesses, and governments recognize the value of this available resource to those who can transform the data into information. These databases, though valuable as individual entities, become significantly more valuable when they function as parts of a federated database and their data can be aggregated for collective mining or computations. This requires new algorithms to shift their focus from working with single databases to efficiently working with federated databases. In this paper, we propose a new decomposable version of the popular k-means clustering algorithm that works in this desired manner with a set of networked databases. We show that it is possible to perform global computation in a reasonably secure manner for either horizontally or vertically distributed databases. The computation is completed by only exchanging a few local summaries among the databases. An empirical and analytical validation of our results is also presented.
机译:互联网时代和高速网络已经引入了可以立即访问大量地理分布数据的功能。个人,企业和政府都认识到这种可用资源对那些可以将数据转换为信息的人的价值。这些数据库虽然作为单独的实体有价值,但是当它们充当联合数据库的一部分并且可以将其数据进行汇总以进行集体挖掘或计算时,其价值将大大提高。这就需要新的算法将其重点从使用单个数据库转移到有效使用联邦数据库。在本文中,我们提出了一种流行的k-means聚类算法的新可分解版本,该算法可按期望的方式与一组网络数据库一起工作。我们表明,有可能以合理安全的方式对水平或垂直分布的数据库执行全局计算。通过仅在数据库之间交换一些本地摘要来完成计算。还提供了我们的结果的经验和分析验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号