Incremental technique with set of frequent word item sets for mining large Indonesian text data

机译：带有频繁单词项目集的增量技术，用于挖掘大型印尼文本数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Indonesian text data from social media is one of large text data that interesting to be mined. Mining the insight knowledge from large text data need more effort and time to processed. Moreover, Indonesian text data from social media contains natural language, including slang that require special treatment. We propose incremental technique for more efficient mining process of large text data with Set of Frequent Word Itemset (SFWI) representation that had been proven capable to keep the meaning of Indonesian text well. We compared Frequent Pattern Growth (FP-Growth) algorithm for not incremental mining and Compact Pattern Growth (CP-Tree) algorithm for incremental mining. The result of experiment with 3,200, 5,000, 110,000, and 239,496 text data form Twitter showed that the incremental technique capable to reduce time process and memory usage for mining Indonesian large text data. Incremental technique with CP-Tree could decrease time process and memory usage so that time process was about 1.66 times faster and 1.84 times more efficient for memory usage than with FP-Growth which was not incremental.

机译：来自社交媒体的印尼文字数据是值得挖掘的大型文字数据之一。从大型文本数据中挖掘洞察力知识需要花费更多的精力和时间来处理。此外，来自社交媒体的印尼文字数据包含自然语言，包括需要特殊对待的语。我们提出了一种渐进技术，该技术可以利用“常用单词项目集”（SFWI）表示集来更有效地挖掘大型文本数据，并已证明能够很好地保留印度尼西亚文本的含义。我们比较了不进行增量挖掘的频繁模式增长（FP-Growth）算法和用于增量挖掘的紧凑模式增长（CP-Tree）算法。对Twitter的3,200、5,000、110,000和239,496个文本数据进行的实验结果表明，该增量技术可以减少挖掘印尼大文本数据的时间过程和内存使用量。与不使用FP-Growth的情况相比，使用CP-Tree的增量技术可以减少时间过程和内存使用率，从而使时间过程的内存使用率大约快1.66倍，效率高1.84倍。

著录项

来源
《International Conference on Cyber and IT Service Management》|2017年|1-6|共6页
会议地点
作者
Dian Saadillah Maylawati; Muhammad Ali Ramdhani; Ali Rahman; Wahyudin Darmalaksana;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data mining; Media; Social network services; TV; Feature extraction; Natural languages; Databases;

机译：数据挖掘;媒体;社交网络服务;电视;特征提取;自然语言;数据库;

相似文献

外文文献
中文文献
专利

1. An Efficient Closed Frequent Item Sets Mining Algorithm-For Mining Closed Frequent Item Sets from Data Streams [J] . Kuthadi Venu Madhav, Selvaraj Rajalakshmi Journal of computational and theoretical nanoscience . 2016,第10期

机译：有效的封闭频繁项目设置挖掘算法 - 用于挖掘数据流的闭合频繁项目集
2. Mining Maximum frequent item sets over data streams using Transaction Sliding Window Techniques [J] . ANNURADHA DHULL, NEERAJ YADAV International journal of computer science and network security . 2014,第2期

机译：使用事务滑动窗口技术挖掘数据流上的最大频繁项目集
3. Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding Window Techniques [J] . Neeraj, Anuradha International Journal of Information Technology Convergence and Services (IJITCS) . 2013,第2期

机译：使用事务滑动窗口技术在数据流上挖掘最大频繁项目集
4. Incremental technique with set of frequent word item sets for mining large Indonesian text data [C] . Dian Saadillah Maylawati, Muhammad Ali Ramdhani, Ali Rahman, International Conference on Cyber and IT Service Management . 2017

机译：挖掘大型印度尼西亚文本数据频繁单词项集的增量技术
5. Computational intelligence and data mining techniques using the fire data set. [D] . Storer, Jeremy. 2016

机译：使用火灾数据集的计算智能和数据挖掘技术。
6. A Text Matching Method to Facilitate the Validation of Frequent Order Sets Obtained Through Data Mining [O] . Chengjian Che, Roberto A. Rocha 2006

机译：一种文本匹配方法有助于验证通过数据挖掘获得的频繁订单集
7. An Improvised Frequent Pattern Tree Based Association Rule Mining Technique with Mining Frequent Item Sets Algorithm and a Modified Header Table [O] . Agarwal, Vandit, Kushal, Mandhani, Kumar, Dr. Preetham 2015

机译：基于简易频繁模式树的关联规则挖掘挖掘频繁项集算法和修改后的标题技术表

Incremental technique with set of frequent word item sets for mining large Indonesian text data

摘要

著录项

相似文献

相关主题

期刊订阅