首页> 外文学位 >A model for mining distributed frequent sequences.
【24h】

A model for mining distributed frequent sequences.

机译:用于挖掘分布式频繁序列的模型。

获取原文
获取原文并翻译 | 示例

摘要

Data mining aims to discover patterns and extract useful information from facts recorded in databases. However, real life applications are inherently distributed, and thus distributed data mining is a more natural way to view data mining generally. A common approach when mining geographically distributed data is to build separate models at geographically distributed sites and then to combine the models at a central site. At the other extreme, all of the data can be moved to a central site and a single model built. With the internet commodity and large datasets the former approach is the quicker but often the less accurate, while the latter approach is more accurate but generally quite expensive in terms of the time required.;This thesis introduces a new framework and methodology for distributed data mining (DDM) that is intermediate solution between the above two approaches. It is intermediate because it adopts the first approach but combines models only when strong evidence on their similarity exists. This improves accuracy and accelerates time response. In this model, differences and similarities between distributed data sites are explicitly addressed and expressed via a similarity values between sites.;The framework reduces the problem into a similarity problem between models. To solve the reduced problem a similarity measure was required. A similarity measure based on the idea that a similarity notion should reflect how much work is needed to transform one model to another is formalized and verified through experiments. The application of this similarity measure on datasets places them in clusters according to their similarities. At the end, sites in each cluster participate in one global model.;Experiments on this framework show that models resulted from the proposed strategy have better results when compared to those resulting from central strategy. It also showed that this framework effectively bridges two simple approaches to distributed data mining which are common today.
机译:数据挖掘旨在发现模式并从数据库中记录的事实中提取有用的信息。但是,现实生活中的应用程序固有地是分布式的,因此,分布式数据挖掘是查看数据挖掘的更自然的方式。挖掘地理分布的数据时,一种常见的方法是在地理分布的站点上构建单独的模型,然后在中央站点组合这些模型。在另一个极端,所有数据都可以移动到中央站点并构建单个模型。对于互联网商品和大型数据集,前一种方法较快,但通常精度较差,而后一种方法较准确,但在所需时间上通常相当昂贵。;本文介绍了一种新的分布式数据挖掘框架和方法(DDM)是上述两种方法之间的中间解决方案。它是中间的,因为它采用第一种方法,但仅在存在相似性的有力证据时才组合模型。这样可以提高准确性并加快时间响应。在此模型中,分布式数据站点之间的差异和相似性是通过站点之间的相似性值明确解决和表示的。该框架将问题简化为模型之间的相似性问题。为了解决减少的问题,需要相似性度量。基于相似性概念应反映将一个模型转换为另一个模型所需的工作量的想法,对相似性度量进行了形式化并通过实验进行了验证。在数据集上应用这种相似性度量会根据它们的相似性将它们放在群集中。最后,每个集群中的站点都参与一个全局模型。该框架上的实验表明,与中心策略相比,该提议策略产生的模型具有更好的结果。它还表明,该框架有效地桥接了两种常见的分布式数据挖掘方法。

著录项

  • 作者

    Soliman, Maha Mohamed.;

  • 作者单位

    University of Louisville.;

  • 授予单位 University of Louisville.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 103 p.
  • 总页数 103
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号