首页> 外文期刊>Expert Systems with Application >Combination of genetic network programming and knapsack problem to support record clustering on distributed databases
【24h】

Combination of genetic network programming and knapsack problem to support record clustering on distributed databases

机译:遗传网络编程和背包问题相结合,支持分布式数据库上的记录聚类

获取原文
获取原文并翻译 | 示例

摘要

This research involves implementation of genetic network programming (GNP) and standard dynamic programming to solve the knapsack problem (KP) as a decision support system for record clustering in distributed databases. Fragment allocation with storage capacity limitation problem is a background of the proposed method. The problem of storage capacity is to distribute sets of fragments into several sites (clusters). Total amount of fragments in each site must not exceed the capacity of site, while the distribution process must keep the relation (similarity) between fragments within each site. The objective is to distribute big data to certain sites with the limited amount of capacities by considering the similarity of distributed data in each site. To solve this problem, GNP is used to extract rules from big data by considering characteristics (value ranges) of each attribute in a dataset. The proposed method also provides partial random rule extraction method in GNP to discover frequent patterns in a database for improving the clustering algorithm, especially for large data problems. The concept of KP is applied to the storage capacity problem and standard dynamic programming is used to distribute rules to each site by considering similarity (value) and data amount (weight) related to each rule to match the site capacities. From the simulation results, it is clarified that the proposed method shows some advantages over the conventional clustering algorithms, therefore, the proposed method provides a new clustering method with an additional storage capacity problem. (C) 2015 Elsevier Ltd. All rights reserved.
机译:这项研究涉及遗传网络编程(GNP)和标准动态编程的实现,以解决背包问题(KP)作为分布式数据库中记录聚类的决策支持系统。具有存储容量限制问题的片段分配是该方法的背景。存储容量的问题是将片段集分布到几个站点(群集)中。每个站点中的片段总数不得超过站点的容量,而分发过程必须保持每个站点中的片段之间的关系(相似性)。目的是通过考虑每个站点中分布式数据的相似性,将大数据以有限的容量分配到某些站点。为了解决此问题,GNP用于通过考虑数据集中每个属性的特征(值范围)从大数据中提取规则。所提出的方法还提供了GNP中的部分随机规则提取方法,以发现数据库中的频繁模式以改进聚类算法,尤其是针对大数据问题。 KP的概念适用于存储容量问题,并且通过考虑与每个规则相关的相似性(值)和数据量(权重)以匹配站点容量,使用标准动态编程将规则分发到每个站点。从仿真结果可以看出,与传统的聚类算法相比,该方法具有一定的优势,因此,该方法提供了一种新的聚类方法,但存在存储容量问题。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号