首页> 外国专利> Maximalσ-Frequent Subtree Extraction of XML data by binary code

Maximalσ-Frequent Subtree Extraction of XML data by binary code

机译:用二进制代码最大幅度提取XML数据的子树

摘要

The present invention is a method of extracting useful information that can be applied to an XML document that has recently been used as a standard for information exchange and storage on the web. After presenting XML data using binary code, a set of bit sequences The present invention provides a method for extracting a maximum σ-occurrence subtree that satisfies the value of σ, which is directly input from a user from re-expressed XML data.;The present invention comprises three steps: representing all input XML trees in binary code representation, obtaining all frequent prefix PairSets from the trees specified in binary code, and transforming them back into a tree structure. In the binary code representation step, first, n- bit binary code generation for each node is performed, and then the respective paths are represented by successive concatenations of nodes. In the frequent prefix PairSets generation step, each path is decomposed to a depth, and the decomposed n- bit prefixes are keyed and paired with tree indexes including the key to set elements of the PairSet. Deriving all the frequent prefix PairSets needed for the final frequent subtree generation from the prefix pairsets. In the final step, we create a subtree structure from the frequent prefix PairSets. The present invention aims to find a common subtree that satisfies minimum support σ efficiently for documents without a predetermined schema based on the fact that XML documents are semi-structured.
机译:本发明是一种提取有用信息的方法,该方法可以应用于最近被用作网络上信息交换和存储标准的XML文档。在使用二进制代码呈现XML数据之后,一组比特序列本发明提供了一种提取满足σ值的最大σ出现子树的方法,该树直接从用户从重新表达的XML数据中输入。本发明包括三个步骤:以二进制代码表示来表示所有输入的XML树;从二进制代码中指定的树中获得所有频繁的前缀PairSet,并将它们转换回树结构。在二进制代码表示步骤中,首先,对每个节点执行 n-位二进制代码生成,然后通过节点的连续级联来表示各个路径。在频繁的前缀PairSets生成步骤中,将每个路径分解为一个深度,然后对分解后的 n-位前缀进行键控,并与包含设置PairSet元素的键的树索引配对。从前缀对集中得出最终的频繁子树生成所需的所有频繁前缀PairSet。在最后一步中,我们从常用前缀PairSets创建一个子树结构。本发明旨在基于XML文档是半结构化的事实,找到对于没有预定模式的文档有效地满足最小支持σ的公共子树。

著录项

  • 公开/公告号KR100539022B1

    专利类型

  • 公开/公告日2005-12-27

    原文格式PDF

  • 申请/专利权人

    申请/专利号KR20040037156

  • 发明设计人 김응모;백주련;

    申请日2004-05-25

  • 分类号G06F17/30;G06F17/00;

  • 国家 KR

  • 入库时间 2022-08-21 21:27:12

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号