Fine-grain Web Site Structure Discovery

机译：细粒度的网站结构发现

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Several techniques have been recently proposed to automatically derive web wrappers, i.e., programs that extract data from HTML pages, and transform them into a more structured format, typically in XML syntax. These techniques automatically induce a wrapper from a set of sample pages that share a common HTML template. An open issue, however, is how to collect suitable classes of sample pages to feed the wrapper inducer. Presently, the pages are chosen manually. In this paper, we tackle the problem of automatically discovering the main classes of pages offered by a site by exploring only a small, representative, portion of it. The web site model we propose describes the structure of the site as a graph whose nodes are classes of pages that share a common structure, and whose edges represent links among instances of the page classes. Using this model, we have developed an algorithm that accepts the url of an entry point to the target web site, visits a limited portion of the site, and produces an accurate model of the site structure. We also report on preliminary experiments performed on actual web sites, that have produced encouraging results.

机译：最近提出了几种技术来自动派生Web包装器，即从HTML页面提取数据并将其转换为结构化格式的程序，通常采用XML语法。这些技术会自动从一组共享通用HTML模板的示例页面中引入包装器。但是，一个未解决的问题是如何收集适当类别的示例页面以喂入包装诱导器。当前，页面是手动选择的。在本文中，我们通过仅探索一小部分具有代表性的部分来解决自动发现站点提供的主要页面类别的问题。我们建议的网站模型将网站的结构描述为一个图形，其节点是共享同一结构的页面类别，其边缘表示页面类别实例之间的链接。使用此模型，我们开发了一种算法，该算法接受目标网站的入口点的URL，访问该网站的有限部分，并生成一个准确的网站结构模型。我们还报告了在实际网站上进行的初步实验，这些实验产生了令人鼓舞的结果。

著录项

来源
《ACM(Association for Computing Machinery) International Workshop on Web Information and Data Management(WIDM 2003); 20031107-20031108; New Orleans,LA; US》|2003年|P.15-22|共8页
会议地点 New Orleans LA(US);New Orleans LA(US);New Orleans LA(US);New Orleans LA(US)
作者
Valter Crescenzi; Paolo Merialdo; Paolo Missier;
展开▼
作者单位

Universita Roma Tre D.I.A. Roma, Italy;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
web information systems; information extraction; wrapper induction; clustering; web modeling;

机译：网络信息系统；信息提取；包装器归纳；聚类；网络建模;

相似文献

外文文献
中文文献
专利

1. An empirical study of web site navigation structures' impacts on web site usability [J] . Xiang Fang, Clyde W. Holsapple Decision support systems . 2007,第2期

机译：网站导航结构对网站可用性影响的实证研究
2. Building a Culturally-Competent Web Site: A Cross-Cultural Analysis of Web Site Structure [J] . Cui Tingru, Wang Xinwei, Teo Hock-Hai Journal of global information management . 2015,第4期

机译：建立具有文化竞争力的网站：网站结构的跨文化分析
3. WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures [J] . Mike P. Liang, D. Rey Banatao, Teri E. Klein, Nucleic Acids Research . 2003,第13期

机译：WebFEATURE：交互式Web工具，用于识别和可视化大分子结构上的功能位点
4. Fine-grain Web Site Structure Discovery [C] . Valter Crescenzi, Paolo Merialdo, Paolo Missier Association for Computing Machinery International Workshop on Web Information and Data Management . 2003

机译：精细谷物网站结构发现
5. Leaders' in English language learning (ELL) perceptions of ELL Internet Web sites' adherence to ELL standards and Web selection criteria using the Survey of ELL Web Sites: A case study. [D] . King, LaRee Kay. 2004

机译：使用ELL网站调查，领导者对ELL Internet网站遵守ELL标准和Web选择标准的英语学习（ELL）看法：案例研究。
6. WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures [O] . Mike P. Liang, D. Rey Banatao, Teri E. Klein, 2003

机译：WebFEATURE：交互式Web工具用于识别和可视化大分子结构上的功能位点
7. Discovery of concept entities from Web sites using web unit mining [O] . Author(s Yin, Ming Goh, Dion Hoe-lian Lim, 2015

机译：使用Web单元挖掘从Web站点中发现概念实体

Fine-grain Web Site Structure Discovery

摘要

著录项

相似文献

相关主题

期刊订阅