首页> 外文会议>IEEE International Conference on Cloud Computing in Emerging Markets >Compressing Closed Frequent Itemsets with Controlled Information Loss
【24h】

Compressing Closed Frequent Itemsets with Controlled Information Loss

机译:在控制信息丢失的情况下压缩封闭的频繁项目集

获取原文

摘要

Closed frequent itemsets (CFIs) condense frequent itemsets without loss of information. For large and dense datasets like big data and unbound big data streams, even the number of CFIs generated can be enormous. In such scenarios approximation is preferred against an accurate solution. Subset Significance Threshold (SST) is an effective constraint variable in mining significant CFIs. The support of the insignificant CFIs is approximated to the support of their immediate superset. However, few insignificant CFIs are approximated beyond specified SST due to chaining effect. To overcome this limitation in SST, the authors are proposing an enhancement to the SST (e-SST) in this paper to improve the degree of accuracy of the approximated insignificant CFIs. The merging of insignificant CFIs to thier superset is limited to one level so that the approximation is bound within specified SST. Experimental results show that the e-SST technique is efficient than SST in limiting the approximation of the support of insignificant CFIs within the specified threshold, thus reducing the information loss.
机译:封闭的频繁项集(CFI)可以在不丢失信息的情况下压缩频繁项集。对于大数据和密集数据集(例如大数据和未绑定的大数据流),甚至生成的CFI数量也可能非常庞大。在这种情况下,最好采用近似而不是精确的解决方案。子集重要性阈值(SST)是挖掘重要CFI时的有效约束变量。无关紧要的CFI的支持近似于其直接超集的支持。但是,由于连锁效应,很少有不重要的CFI近似超出指定的SST。为了克服SST中的这一限制,作者在本文中建议对SST(e-SST)进行增强,以提高近似无关紧要的CFI的准确性。无关紧要的CFI与它们的超集的合并被限制为一个级别,以便将近似值绑定在指定的SST内。实验结果表明,e-SST技术比SST效率高,可以将微不足道的CFI支持的近似值限制在指定的阈值之内,从而减少了信息丢失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号