首页> 外文会议>International Conference on Cyber and IT Service Management >Incremental technique with set of frequent word item sets for mining large Indonesian text data
【24h】

Incremental technique with set of frequent word item sets for mining large Indonesian text data

机译:带有频繁单词项目集的增量技术,用于挖掘大型印尼文本数据

获取原文

摘要

Indonesian text data from social media is one of large text data that interesting to be mined. Mining the insight knowledge from large text data need more effort and time to processed. Moreover, Indonesian text data from social media contains natural language, including slang that require special treatment. We propose incremental technique for more efficient mining process of large text data with Set of Frequent Word Itemset (SFWI) representation that had been proven capable to keep the meaning of Indonesian text well. We compared Frequent Pattern Growth (FP-Growth) algorithm for not incremental mining and Compact Pattern Growth (CP-Tree) algorithm for incremental mining. The result of experiment with 3,200, 5,000, 110,000, and 239,496 text data form Twitter showed that the incremental technique capable to reduce time process and memory usage for mining Indonesian large text data. Incremental technique with CP-Tree could decrease time process and memory usage so that time process was about 1.66 times faster and 1.84 times more efficient for memory usage than with FP-Growth which was not incremental.
机译:来自社交媒体的印尼文字数据是值得挖掘的大型文字数据之一。从大型文本数据中挖掘洞察力知识需要花费更多的精力和时间来处理。此外,来自社交媒体的印尼文字数据包含自然语言,包括需要特殊对待的语。我们提出了一种渐进技术,该技术可以利用“常用单词项目集”(SFWI)表示集来更有效地挖掘大型文本数据,并已证明能够很好地保留印度尼西亚文本的含义。我们比较了不进行增量挖掘的频繁模式增长(FP-Growth)算法和用于增量挖掘的紧凑模式增长(CP-Tree)算法。对Twitter的3,200、5,000、110,000和239,496个文本数据进行的实验结果表明,该增量技术可以减少挖掘印尼大文本数据的时间过程和内存使用量。与不使用FP-Growth的情况相比,使用CP-Tree的增量技术可以减少时间过程和内存使用率,从而使时间过程的内存使用率大约快1.66倍,效率高1.84倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号