首页> 外国专利> Maximalσ-Frequent Subtree Extraction of XML data by binary code

Maximalσ-Frequent Subtree Extraction of XML data by binary code

机译：用二进制代码最大幅度提取XML数据的子树

页面导航

摘要
著录项
相似文献

摘要

The present invention is a method of extracting useful information that can be applied to an XML document that has recently been used as a standard for information exchange and storage on the web. After presenting XML data using binary code, a set of bit sequences The present invention provides a method for extracting a maximum σ-occurrence subtree that satisfies the value of σ, which is directly input from a user from re-expressed XML data.;The present invention comprises three steps: representing all input XML trees in binary code representation, obtaining all frequent prefix PairSets from the trees specified in binary code, and transforming them back into a tree structure. In the binary code representation step, first, n- bit binary code generation for each node is performed, and then the respective paths are represented by successive concatenations of nodes. In the frequent prefix PairSets generation step, each path is decomposed to a depth, and the decomposed n- bit prefixes are keyed and paired with tree indexes including the key to set elements of the PairSet. Deriving all the frequent prefix PairSets needed for the final frequent subtree generation from the prefix pairsets. In the final step, we create a subtree structure from the frequent prefix PairSets. The present invention aims to find a common subtree that satisfies minimum support σ efficiently for documents without a predetermined schema based on the fact that XML documents are semi-structured.

机译：本发明是一种提取有用信息的方法，该方法可以应用于最近被用作网络上信息交换和存储标准的XML文档。在使用二进制代码呈现XML数据之后，一组比特序列本发明提供了一种提取满足σ值的最大σ出现子树的方法，该树直接从用户从重新表达的XML数据中输入。本发明包括三个步骤：以二进制代码表示来表示所有输入的XML树;从二进制代码中指定的树中获得所有频繁的前缀PairSet，并将它们转换回树结构。在二进制代码表示步骤中，首先，对每个节点执行 n-位二进制代码生成，然后通过节点的连续级联来表示各个路径。在频繁的前缀PairSets生成步骤中，将每个路径分解为一个深度，然后对分解后的 n-位前缀进行键控，并与包含设置PairSet元素的键的树索引配对。从前缀对集中得出最终的频繁子树生成所需的所有频繁前缀PairSet。在最后一步中，我们从常用前缀PairSets创建一个子树结构。本发明旨在基于XML文档是半结构化的事实，找到对于没有预定模式的文档有效地满足最小支持σ的公共子树。 展开▼

著录项

公开/公告号KR100539022B1

专利类型

公开/公告日2005-12-27

原文格式PDF

申请/专利权人
展开▼

申请/专利号KR20040037156

发明设计人 김응모;백주련;
展开▼

申请日2004-05-25

分类号G06F17/30;G06F17/00;

国家 KR

入库时间 2022-08-21 21:27:12

相似文献

专利

外文文献

中文文献

1. 基于提取的和访问的网络地址确定二进制软件代码中的安全风险 [P] . 中国专利： CN112789615A . 2021-05-11

2. 一种从二进制文件中提取代码信息的方法及装置 [P] . 中国专利： CN103777966B . 2017.05.31

3. MAXIMALÊ-FREQUENT SUBTREE EXTRACTION OF XML DATA BY BINARY CODE [P] . 韩国专利： KR20050112229A . 2005-11-30

机译：通过二进制代码最大程度地提取XML数据

4. Fast extraction of scalar values from binary encoded XML [P] . 美国专利： US8429196B2 . 2013-04-23

机译：从二进制编码的XML快速提取标量值

5. FAST EXTRACTION OF SCALAR VALUES FROM BINARY ENCODED XML [P] . 美国专利： US2009307239A1 . 2009-12-10

机译：从二进制编码的XML快速提取标量值

1. Research of Methods for Lost Data Reconstruction in Erasure Codes over Binary Fields [J] . Dan Tang 电子科技学刊：英文版 . 2016,第001期

2. Research of Methods for Lost Data Reconstruction in Erasure Codes over Binary Fields [J] . Dan Tang 电子科学学刊（英文版） . 2016,第001期

3. Frequent item sets mining from high-dimensional dataset based on a novel binary particle swarm optimization [J] . 张中杰, 黄健, 卫莹中南大学学报（英文版） . 2016,第007期

4. A Simple Yet Efficient Approach for Maximal Frequent Subtrees Extraction from a Collection of XML Documents [C] . Juryon Paik, Ung Mo Kim Web Information Systems - WISE 2006 Workshops; Lecture Notes in Computer Science; 4256 . 2006

5. Mining frequent structural patterns from XML datasets. [D] . Ali, Mohammed Mohsin. 2012

6. Enumerating all maximal frequent subtrees in collections of phylogenetic trees [O] . Akshay Deepak, David Fernández-Baca 2014

7. Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees [O] . Yun Chi, Yi Xia, Yirong Yang, 2004

1. Discovering Frequent Subtrees from XML Data Using Neural Networks [J] . SUN Wei LIU Da-xin WANG Tong . 武汉大学自然科学学报：英文版 . 2006,第1期

2. Research of Methods for Lost Data Reconstruction in Erasure Codes over Binary Fields [J] . Dan Tang . 电子科技学刊 . 2016,第001期

3. 基于最大频繁Induced子树的GML文档结构聚类 [C] . 朱颖雯 ,吉根林 . 第三届江苏计算机大会 . 2008

4. 面向恶意代码分析的二进制组件提取关键技术研究 [A] . 鱼源 . 2015