首页> 外文期刊>Data mining and knowledge discovery >Mining rooted ordered trees under subtree homeomorphism
【24h】

Mining rooted ordered trees under subtree homeomorphism

机译:在子树同胚下挖掘有根有序树

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern. We then define efficient join operations on the occ data-structure, which help us count occurrences of tree patterns according to occurrences of their proper subtrees. Based on the proposed subtree homeomorphism method, we develop an effective pattern mining algorithm, called TPMiner. We evaluate the efficiency of TPMiner on several real-world and synthetic datasets. Our extensive experiments confirm that TPMiner always outperforms well-known existing algorithms, and in several cases the improvement with respect to existing algorithms is significant.
机译:挖掘频繁的树模式在不同领域有许多应用程序,例如XML数据,生物信息学和万维网。频繁模式挖掘中的关键步骤是频率计数,它涉及匹配运算符,以查找给定树木集合中树木模式的出现(实例)。广泛使用的用于树状结构数据的匹配算子是子树同胚,其中树状图的边缘映射到给定树中的祖先后代关系。在子树同胚状态下频繁出现的树模式通常称为嵌入模式。在本文中,我们提出了一种有效的子树同胚算法,并应用于频繁模式挖掘。我们提出了一个紧凑的数据结构,称为occ,它仅存储有关最右边出现路径的信息,因此可以编码和表示树模式的多次出现。然后,我们在occ数据结构上定义有效的联接操作,这有助于我们根据树模式的正确子树的出现次数来计数树模式的出现。基于提出的子树同胚方法,我们开发了一种有效的模式挖掘算法,称为TPMiner。我们评估了TPMiner在一些实际和综合数据集上的效率。我们广泛的实验证实TPMiner总是优于已知的现有算法,并且在某些情况下,相对于现有算法的改进非常重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号