Parallel Mining of Top-K Frequent Itemsets in Very Large Text Database

机译：超大型文本数据库中前K个频繁项集的并行挖掘

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequent itemsets mining is a common and useful task in data mining. But most of the current mining algorithms can't be used in very large text database. In this paper, we propose a novel and efficient parallel algorithm parTFI which is used to find top-k frequent itemsets with specified minimum length in very large text database. Base on a simple data structure H-struct, parTFI uses a novel logical vertical data partition- technique to mine top-k frequent itemsets at each mining server parallel. Our performance study shows that when processing very large sparse text database, parTFI outperforms Apriori and FP-growth, two efficient frequent iemsets mining algorithms, even when both are running with the better tuned min_support. Furthermore, by creating H-struct dynamically, parTFI can suit even huge dataset that most other algorithms can't process.

机译：频繁项集挖掘是数据挖掘中常见且有用的任务。但是当前大多数挖掘算法都不能在超大型文本数据库中使用。在本文中，我们提出了一种新颖高效的并行算法parTFI，该算法用于在非常大的文本数据库中查找具有指定最小长度的前k个频繁项集。 parTFI基于简单的数据结构H结构，使用一种新颖的逻辑垂直数据分区技术在每个并行的挖掘服务器上挖掘前k个频繁项集。我们的性能研究表明，当处理非常大的稀疏文本数据库时，parTFI的性能优于Apriori和FP-growth，这两种有效的频繁贴图集挖掘算法，即使两者都在优化的min_support上运行。此外，通过动态创建H结构，parTFI甚至可以适应大多数其他算法无法处理的巨大数据集。

著录项

来源
《International Conference on Advances in Web-Age Information Management(WAIM 2005); 20051011-13; Hangzhou(CN) 》|2005年|P.706-712|共7页
会议地点 Hangzhou(CN)
作者
Yongheng Wang; Yan Jia; Shuqiang Yang;
展开▼
作者单位

Institute of Network, Computer School, National Universty of Defense Technology, Changsha, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络 ; 信息处理（信息加工） ;
关键词

相似文献

外文文献
中文文献
专利

1. Mining top-k regular-frequent itemsets using database partitioning and support estimation [J] . Komate Amphawan, Philippe Lenca, Athasit Surarerks Expert Systems with Application . 2012 ,第2期

机译：使用数据库分区和支持估算来挖掘前k个经常性项目集
2. Discovering Top-k Probabilistic Frequent Itemsets from Uncertain Databases [J] . Haifeng Li, Yuejin Zhang, Ning Zhang Procedia Computer Science . 2017 ,第1期

机译：从不确定的数据库中发现Top-k概率频繁项集
3. TKFIM: Top-K frequent itemset mining technique based on equivalence classes [J] . Saood Iqbal, Abdul Shahid, Muhammad Roman, PeerJ Computer Science . 2021 ,第a期

机译：TKFIM：基于等同类的Top-K频繁项目集挖掘技术
4. Parallel Mining of Top-K Frequent Itemsets in Very Large Text Database [C] . Yongheng Wang, Yan Jia, Shuqiang Yang International Conference on Advances in Web-Age Information Management . 2005

机译：在非常大的文本数据库中的Top-K频繁项集的平行挖掘
5. Mining Frequent Itemsets of a Central Fill Pharmacy Transaction Database to Enhance the Planogram of Robotic Dispensing System [D] . Sundaramurthy, Sumanth S. 2018

机译：挖掘中央填充药房交易数据库的常用项目集，以增强机器人配药系统的货架图
6. Negative and Positive Association Rules Mining from Text Using Frequent and Infrequent Itemsets [O] . Sajid Mahmood, Muhammad Shahbaz, Aziz Guergachi -1

机译：使用频繁和不频繁项集从文本中挖掘负向和正向关联规则
7. The Top-k Frequent Closed Itemset Mining Using Top-k SAT Problem [O] . Said Jabbour, Lakhdar Sais, Yakoub Salhi 2015

机译：利用Top-k saT问题挖掘Top-k频繁闭项集

Parallel Mining of Top-K Frequent Itemsets in Very Large Text Database

摘要

著录项

相似文献

相关主题

期刊订阅