首页> 外文期刊>IEICE Transactions on Information and Systems >Efficient Substructure Discovery from Large Semi-Structured Data
【24h】

Efficient Substructure Discovery from Large Semi-Structured Data

机译:从大型半结构化数据中发现有效的子结构

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we consider a data mining problem for semi-structured data. Modeling semi-structured data as labeled ordered trees, we present an efficient algorithm for discovering frequent substructures from a large collection of semi-structured data. By extending the enumeration technique developed by Bayardo (SIGMOD'98) for discovering long item-sets, our algorithm scales almost linearly in the total size of maximal tree patterns contained in an input collection depending mildly on the size of the longest pattern. We also developed several pruning techniques that significantly speed-up the search. Experiments on Web data show that our algorithm runs efficiently on real-life datasets combined with proposed pruning techniques in the wide range of parameters.
机译:在本文中,我们考虑了半结构化数据的数据挖掘问题。将半结构化数据建模为标记的有序树,我们提出了一种从大量半结构化数据中发现频繁子结构的有效算法。通过扩展由Bayardo(SIGMOD'98)开发的枚举技术来发现长项目集,我们的算法几乎根据最长模式的大小线性地缩放输入集合中包含的最大树模式的总大小。我们还开发了几种修剪技术,可显着加快搜索速度。在Web数据上进行的实验表明,我们的算法在各种参数范围内结合拟议的修剪技术,可以在真实数据集上高效运行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号