Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents

机译：在半结构化Web文档中发现频繁的标记树模式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many Web documents such as HTML files and XML files have no rigid structure and are called Semistructured data. In general, such Semistructured Web documents are represented by rooted trees with ordered children. We propose a new method for discovering frequent tree structured patterns in Semistructured Web documents by using a tag tree pattern as a hypothesis. A tag tree pattern is an edge labeled tree with ordered children which has structured variables. An edge label is a tag or a keyword in such Web documents, and a variable can be substituted by an arbitrary tree. So a tag tree pattern is suited for representing tree structured patterns in such Web documents. First we show that it is hard to compute the optimum frequent tag tree pattern. So we present an algorithm for generating all maximally frequent tag tree patterns and give the correctness of it. Finally, we report some experimental results on our algorithm. Although this algorithm is not efficient, experiments show that we can extract characteristic tree structured patterns in those data.

机译：许多Web文档（例如HTML文件和XML文件）没有严格的结构，因此称为半结构化数据。通常，这种半结构化Web文档由带有有序子级的有根树表示。我们提出了一种新的方法，以标记树模式为假设，在半结构化Web文档中发现频繁的树结构模式。标记树模式是带有结构化变量的带有有序子级的边缘标记树。边缘标签是此类Web文档中的标签或关键字，并且变量可以由任意树替换。因此，标记树模式适合于在此类Web文档中表示树状结构的模式。首先，我们表明很难计算出最佳的频繁标记树模式。因此，我们提出了一种算法，用于生成所有最大频率的标记树模式，并给出其正确性。最后，我们报告了有关该算法的一些实验结果。尽管该算法效率不高，但实验表明我们可以从这些数据中提取特征树结构模式。

著录项

来源
《Advances in Knowledge Discovery and Data Mining》|2002年|p.341-355|共15页
会议地点
作者
Tetsuhiro Miyahara; Yusuke Suzuki; Takayoshi Shoudai; Tomoyuki Uchida; Kenichi Takahashi; Hiroaki Ueda;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Discovery of maximally frequent tag tree patterns in semistructured data [J] . Tetsuhiro Miyahara, Takayoshi Shoudai, Tomoyuki Uchida, 電子情報通信学会技術研究報告. オフィスシステム . 2001,第208期

机译：发现半结构化数据中最大频率的标记树模式
2. Discovery of maximally frequent tag tree patterns in semistructured data [J] . Tetsuhiro Miyahara, Takayoshi Shoudai, Tomoyuki Uchida, 電子情報通信学会技術研究報告. オフィスシステム . 2001,第208期

机译：在半系统数据中发现最大频繁的标签树模式
3. Discovery of maximally frequent tag tree patterns in semistructured data [J] . Tetsuhiro Miyahara, Takayoshi Shoudai, Tomoyuki Uchida, 電子情報通信学会技術研究報告. 人工知能と知識処理. Artificial Intelligence and Knowledge Based Processing . 2001,第210期

机译：在半系统数据中发现最大频繁的标签树模式
4. Discovery of Maximally Frequent Tag Tree Patterns with Height-Constrained Variables from Semistructured Web Documents [C] . Suzuki, Y., Miyahara, . 2005

机译：从半结构化Web文档中发现具有高度限制变量的最大频繁标记树模式
5. Pattern discovery in trees: Algorithms and applications to document and scientific data management. [D] . Chang, Chia-Yo. 1999

机译：树中的模式发现：文档和科学数据管理的算法和应用。
6. Documenting Biogeographical Patterns of African Timber Species Using Herbarium Records: A Conservation Perspective Based on Native Trees from Angola [O] . Maria M. Romeiras, Rui Figueira, Maria Cristina Duarte, -1

机译：使用植物标本室记录记录非洲木材物种的生物地理格局：基于安哥拉本地树木的保护视角
7. Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents [O] . Tetsuhiro Miyahara, Yusuke Suzuki, Takayoshi Shoudai, 2002

机译：在半结构化Web文档中发现频繁的标记树模式

Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents

摘要

著录项

相似文献

相关主题

期刊订阅