首页> 外文OA文献 >Intelligent Methods of Fusing the Knowledge During Incremental Learning via Clustering in A Distributed Environment
【2h】

Intelligent Methods of Fusing the Knowledge During Incremental Learning via Clustering in A Distributed Environment

机译:在分布式环境中通过聚类在增量学习中融合知识的智能方法

摘要

One of the ways of learning from the data which is physically distributed over multiple locations is to have a common learning mechanism at each of the source and knowledge of each of the learnt concepts has to be transmitted to a centralized location for assimilation. In this research, clustering is employed as a mechanism of learning and a cluster is viewed as a concept which is described by a set of variables. The set of variables which describes each of the clusters is being referred to as audknowledge packet (KP). As histograms have the generic ability to characterize any type of data, a histogram based regression line has been used as one of the variable to describe a KP. For online monitoring of theudprogression in learning apart from achieving computational ease and efficacy, the KPs at the centralized location are fused incrementally to get the overall knowledge. If learning mechanisms employed are data sequence sensitive, different combinations of merging the thus generated KPuds may result in altogether a different overall knowledge. Further, the distance measure employed to find distance between the KPs in obtaining the optimal sequence of merging, may also result in a different overall knowledge. This phenomenon is being referred to as the problem ofudorder effect. To minimize or avoid the order effectud, a density based spatial clustering of applications with noise (DBSCAN) algorithm, which is insensitive to the order of presentation of data samples is used to learn from the data chunks and a novel methodology of finding the distance between the batches of data and there by finding the more optimal sequence of merging the KPs is presented. A specially designed distance measure for histogram based objects (histo-objects) is employed to find distanceudbetween the KPs and the nearest KPs are merged incrementally till certain conditions are satisfied. The proposed methods provide a robust mechanism of avoiding order effects. Since it is difficult to get the real distributed datasets, effectiveness of the proposed approaches is demonstrated with a carefully designed synthetic dataset. Some of the bench mark datasetsudwere modified to simulate the distributed environment and experimentations with some of them show an accuracy of up toud100%.
机译:从物理上分布在多个位置的数据中学习的一种方法是在每个源上都具有通用的学习机制,并且每个学习概念的知识都必须传输到集中位置以进行同化。在这项研究中,聚类被用作学习的机制,聚类被视为由一组变量描述的概念。描述每个群集的变量集称为知识包(KP)。由于直方图具有表征任何类型数据的通用能力,因此基于直方图的回归线已用作描述KP的变量之一。除了实现计算的简易性和有效性之外,为了在线监控学习的进展,将集中位置的KP逐步融合以获取整体知识。如果采用的学习机制对数据序列敏感,则合并因此生成的KP uds的不同组合可能会导致完全不同的总体知识。此外,在获得最佳合并序列时用于查找KP之间的距离的距离度量也可能导致不同的总体知识。这种现象被称为“混乱效应”问题。为了最小化或避免顺序影响,使用对数据样本的显示顺序不敏感的基于密度的带噪声应用程序空间聚类(DBSCAN)算法从数据块中学习,并采用一种新颖的方法来查找通过找到合并KP的更优化顺序,显示了这批数据之间的距离。为基于直方图的对象(组织对象)专门设计了一种距离度量,以查找KP之间的距离,最近的KP逐步合并,直到满足特定条件为止。所提出的方法提供了避免顺序影响的鲁棒机制。由于很难获得真实的分布式数据集,因此通过精心设计的合成数据集可以证明所提出方法的有效性。修改了一些基准数据集ud以模拟分布式环境,并且对其中的一些实验显示出高达ud100%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号