首页>
外国专利>
Maximalσ-Frequent Subtree Extraction of XML data by binary code
Maximalσ-Frequent Subtree Extraction of XML data by binary code
展开▼
机译:用二进制代码最大幅度提取XML数据的子树
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention is a method of extracting useful information that can be applied to an XML document that has recently been used as a standard for information exchange and storage on the web. After presenting XML data using binary code, a set of bit sequences The present invention provides a method for extracting a maximum σ-occurrence subtree that satisfies the value of σ, which is directly input from a user from re-expressed XML data.;The present invention comprises three steps: representing all input XML trees in binary code representation, obtaining all frequent prefix PairSets from the trees specified in binary code, and transforming them back into a tree structure. In the binary code representation step, first, n- bit binary code generation for each node is performed, and then the respective paths are represented by successive concatenations of nodes. In the frequent prefix PairSets generation step, each path is decomposed to a depth, and the decomposed n- bit prefixes are keyed and paired with tree indexes including the key to set elements of the PairSet. Deriving all the frequent prefix PairSets needed for the final frequent subtree generation from the prefix pairsets. In the final step, we create a subtree structure from the frequent prefix PairSets. The present invention aims to find a common subtree that satisfies minimum support σ efficiently for documents without a predetermined schema based on the fact that XML documents are semi-structured.
展开▼