首页> 外文学位 >Object and relational clustering based on new robust estimators and genetic niching with applications to Web mining.
【24h】

Object and relational clustering based on new robust estimators and genetic niching with applications to Web mining.

机译:基于新的鲁棒估计量的对象和关系聚类以及遗传小生境及其在Web挖掘中的应用。

获取原文
获取原文并翻译 | 示例

摘要

In this dissertation, we present new robust estimators that attempt to overcome the disadvantages of most existing robust estimation techniques. We also present new robust clustering algorithms based on these estimators, and a novel approach to unsupervised clustering based on genetic niching. The resulting clustering algorithms are applied successfully to mine user profiles from real Web access logs.; The Maximal Density Estimator technique (MDE) is a new linear complexity robust estimator that is free of any presuppositions about the contamination rate in noisy data sets. The Multivariate MDE (MMDE) generalizes MDE for multivariate data sets. MDE and MMDE are computationally attractive and quite insensitive to initialization. Our theoretical analysis shows that MDE and MMDE can be considered as new M-estimators that estimate both location and scale simultaneously, and that they can be expected to be sufficiently protected against very large outliers without compromising their efficiency. Based on MDE and MMDE, we present two new robust clustering algorithms, two unsupervised robust clustering procedures for the case when the number of clusters is unknown, and a new robust relational clustering algorithm that can deal with complex and subjective dissimilarity/similarity measures that are not restricted to be Euclidean.; We explore the use of genetic algorithms in robust clustering in several ways. We extend the objective function of the Least Median of Squares (LMedS) estimator so that it can simultaneously partition a given data set into C clusters, and design a genetic algorithm to search the solution space more efficiently. We also present a novel approach to unsupervised robust clustering, called Unsupervised Niche Clustering (UNC), based on genetic niching and an improved restricted mating scheme to alleviate the problem of crossover interaction between distinct niches.; We introduce a new approach to Web mining based on the extraction of different user profiles from very large amounts of semi-structured Web access log data. We define the notion of a "user session", and present a new subjective dissimilarity measure between two Web sessions. We apply our new robust relational clustering algorithm to extract typical robust session profiles that reflect distinct user interests from real server logs. We also present a hierarchical approach to clustering the Web sessions based on UNC (HUNC) which is computationally much simpler and can determine the number of clusters automatically. This approach offers the advantage of multi-resolution profiling.
机译:在本文中,我们提出了新的鲁棒估计器,试图克服大多数现有鲁棒估计技术的缺点。我们还提出了基于这些估计量的新的鲁棒聚类算法,以及一种基于遗传小生境的无监督聚类的新方法。所得的群集算法已成功应用于从真实Web访问日志中挖掘用户配置文件。最大密度估计器技术(MDE)是一种新的线性复杂度鲁棒估计器,它没有关于噪声数据集中污染率的任何假设。多元MDE(MMDE)概括了多元数据集的MDE。 MDE和MMDE在计算上很有吸引力,并且对初始化不敏感。我们的理论分析表明,可以将MDE和MMDE视为同时估计位置和规模的新M估计器,并且可以期望它们在不影响其效率的前提下得到了足够的保护,可以免受非常大的异常值的影响。基于MDE和MMDE,我们提出了两种新的鲁棒聚类算法,针对簇数未知的情况提供了两种无监督的鲁棒聚类程序,以及一种新的鲁棒的关​​系聚类算法,该算法可以处理不限于欧几里得。我们以几种方式探索遗传算法在鲁棒聚类中的使用。我们扩展了最小二乘(LMedS)估计器的目标函数,以便它可以将给定的数据集同时划分为C个簇,并设计了一种遗传算法来更有效地搜索解空间。我们还提出了一种基于遗传小生境和改进的受限交配方案的无监督稳健聚类的新方法,称为无监督小生境聚类(UNC),以缓解不同生态位之间的交叉相互作用问题。我们基于从大量半结构化Web访问日志数据中提取不同用户配置文件的方式,引入了一种新的Web挖掘方法。我们定义了“用户会话”的概念,并提出了两个Web会话之间新的主观差异度量。我们应用新的鲁棒关系群集算法来提取典型的鲁棒会话配置文件,这些配置文件反映了真实服务器日志中不同的用户兴趣。我们还提出了一种基于UNC(HUNC)对Web会话进行聚类的分层方法,该方法在计算上更加简单,并且可以自动确定聚类的数量。这种方法具有多分辨率分析的优点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号