首页> 外文会议> >Parallel mining of association rules from text databases on a cluster of workstations
【24h】

Parallel mining of association rules from text databases on a cluster of workstations

机译:从工作站集群上的文本数据库中并行挖掘关联规则

获取原文

摘要

Summary form only given. We propose a new algorithm named Parallel Multipass with Inverted Hashing and Pruning (PMIHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., sets of words) that need to be counted. The new PMIHP algorithm is a parallel version of our multipass with inverted hashing and pruning (MIHP) algorithm, which was shown to be quite efficient than other existing algorithms in the context of mining text databases. The PMIHP algorithm reduces the overhead of communication between miners running on different processors because they are mining local databases asynchronously and prune the global candidates by using the inverted hashing and pruning technique.
机译:仅提供摘要表格。我们提出了一种新算法,称为并行多遍历与反向哈希和修剪(PMIHP),用于挖掘文本数据库中单词之间的关联规则。文本数据库的特征与零售交易数据库的特征完全不同,并且由于需要计算大量的项目集(即单词集),因此现有的挖掘算法无法有效地处理文本数据库。新的PMIHP算法是我们的​​带有反向哈希和修剪(MIHP)算法的多遍算法的并行版本,在挖掘文本数据库的情况下,该算法比其他现有算法具有更高的效率。 PMIHP算法减少了运行在不同处理器上的矿工之间的通信开销,因为它们异步地挖掘本地数据库并通过使用反向哈希和修剪技术来修剪全局候选对象。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号