A model for mining distributed frequent sequences.

机译：用于挖掘分布式频繁序列的模型。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data mining aims to discover patterns and extract useful information from facts recorded in databases. However, real life applications are inherently distributed, and thus distributed data mining is a more natural way to view data mining generally. A common approach when mining geographically distributed data is to build separate models at geographically distributed sites and then to combine the models at a central site. At the other extreme, all of the data can be moved to a central site and a single model built. With the internet commodity and large datasets the former approach is the quicker but often the less accurate, while the latter approach is more accurate but generally quite expensive in terms of the time required.;This thesis introduces a new framework and methodology for distributed data mining (DDM) that is intermediate solution between the above two approaches. It is intermediate because it adopts the first approach but combines models only when strong evidence on their similarity exists. This improves accuracy and accelerates time response. In this model, differences and similarities between distributed data sites are explicitly addressed and expressed via a similarity values between sites.;The framework reduces the problem into a similarity problem between models. To solve the reduced problem a similarity measure was required. A similarity measure based on the idea that a similarity notion should reflect how much work is needed to transform one model to another is formalized and verified through experiments. The application of this similarity measure on datasets places them in clusters according to their similarities. At the end, sites in each cluster participate in one global model.;Experiments on this framework show that models resulted from the proposed strategy have better results when compared to those resulting from central strategy. It also showed that this framework effectively bridges two simple approaches to distributed data mining which are common today.

机译：数据挖掘旨在发现模式并从数据库中记录的事实中提取有用的信息。但是，现实生活中的应用程序固有地是分布式的，因此，分布式数据挖掘是查看数据挖掘的更自然的方式。挖掘地理分布的数据时，一种常见的方法是在地理分布的站点上构建单独的模型，然后在中央站点组合这些模型。在另一个极端，所有数据都可以移动到中央站点并构建单个模型。对于互联网商品和大型数据集，前一种方法较快，但通常精度较差，而后一种方法较准确，但在所需时间上通常相当昂贵。;本文介绍了一种新的分布式数据挖掘框架和方法（DDM）是上述两种方法之间的中间解决方案。它是中间的，因为它采用第一种方法，但仅在存在相似性的有力证据时才组合模型。这样可以提高准确性并加快时间响应。在此模型中，分布式数据站点之间的差异和相似性是通过站点之间的相似性值明确解决和表示的。该框架将问题简化为模型之间的相似性问题。为了解决减少的问题，需要相似性度量。基于相似性概念应反映将一个模型转换为另一个模型所需的工作量的想法，对相似性度量进行了形式化并通过实验进行了验证。在数据集上应用这种相似性度量会根据它们的相似性将它们放在群集中。最后，每个集群中的站点都参与一个全局模型。该框架上的实验表明，与中心策略相比，该提议策略产生的模型具有更好的结果。它还表明，该框架有效地桥接了两种常见的分布式数据挖掘方法。

著录项

作者
Soliman, Maha Mohamed.;
展开▼
作者单位

University of Louisville.;

展开▼
授予单位 University of Louisville.;
学科 Computer science.
学位 Ph.D.
年度 2004
页码 103 p.
总页数 103
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A DISTRIBUTED APPROACH FOR MINING FREQUENT ITEMSETS USING OPTICAL NEURAL NETWORK MODEL [J] . DIVYA BHATNAGAR, NEERU ADLAKHA, ANAND SWAROOP SAXENA International Journal of Engineering Science and Technology . 2011,第5期

机译：使用光学神经网络模型的频率项目挖掘分布式方法
2. A DISTRIBUTED APPROACH FOR MINING FREQUENT ITEMSETS USING OPTICAL NEURAL NETWORK MODEL [J] . DIVYA BHATNAGAR, NEERU ADLAKHA, ANAND SWAROOP SAXENA International Journal of Engineering Science and Technology . 2011,第5期

机译：使用光学神经网络模型的频率项目挖掘分布式方法
3. Hori-Vertical Distributed Frequent Itemsets Mining Algorithm on Heterogeneous Distributed Shared Memory System [J] . Margahny H. Mohamed, Hosam E. Refaat International journal of computer science and network security . 2010,第11期

机译：异构分布式共享存储系统的水平垂直分布频繁项集挖掘算法
4. A Novel Approach for Distributed Frequent Pattern Mining Algorithm using Load-Matrix [C] . Anasuya Sahoo, Rajiv Senapati International Conference on Intelligent Technologies . 2021

机译：具有负载矩阵的分布式频繁模式挖掘算法的一种新方法
5. Mining Frequent Sequences in One Database Scan Using Distributed Computers. [D] . Brajczuk, Dale Allan. 2011

机译：使用分布式计算机在一次数据库扫描中挖掘频繁序列。
6. Data mining in clinical big data: the frequently used databases steps and methodological models [O] . Wen-Tao Wu, Yuan-Jie Li, Ao-Zi Feng, 2021

机译：临床大数据中的数据挖掘：常用的数据库步骤和方法模型
7. PPFP(Push and Pop Frequent Pattern Mining): A Novel Frequent Pattern Mining Method for Bigdata Frequent Pattern Mining [O] . Jung-Hun Lee, Youn-A Min 2016

机译：PPFP（推动和流行频繁模式采矿）：一种新型频繁模式挖掘方法，用于频繁模式挖掘

A model for mining distributed frequent sequences.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅