Extracting Content Structure for Web Pages Based on Visual Representation

机译：基于视觉表示的网页内容结构提取

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure. Experiments show satisfactory results.

机译：提出了一种基于视觉表示的Web内容结构。这种结构可以使许多Web应用程序（例如信息检索，信息提取和自动页面适应）受益。本文提出了一种自上而下的，与标签树无关的自动方法来检测Web内容结构。它根据用户的视觉感受模拟用户如何理解Web布局结构。与其他现有技术相比，我们的方法独立于诸如HTML之类的基础文档表示形式，即使HTML结构与布局结构相差甚远，其效果也很好。实验结果令人满意。

著录项

来源
《5th Asia-Pacific Web Conference on Web Technologies and Applications APWeb 2003 Apr 23-25, 2003 Xian, China》|2003年|p.406-417|共12页
会议地点 Xian(CN);Xian(CN)
作者
Deng Cai; Shipeng Yu; Ji-Rong Wen; Wei-Ying Ma;
展开▼
作者单位

Tsinghua University, Beijing, P.R.China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Fuzzy-Logic-Based Integration of Web Contextual Linguistic Structures for Enriching Conceptual Visual Representations [J] . Mohammed Belkhatir IEEE Transactions on Emerging Topics in Computational Intelligence . 2019,第4期

机译：Web上下文语言结构的基于模糊逻辑的集成，以丰富概念的视觉表示
2. 2StrucCompare: a webserver for visualizing small but noteworthy differences between protein tertiary structures through interrogation of the secondary structure content [J] . Elliot D Drew, Robert W Janes Nucleic acids research . 2019,第W1期

机译：2 StrucCompare：一个网络服务器，用于通过查询二级结构内容来可视化蛋白质三级结构之间的微小但值得注意的差异
3. A Signal-Representation-Based Parser to Extract Text-Based Information from the Web [J] . Mu-Chun Su, Shao-Jui Wang, Chen-Ko Huang, Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2010,第5a77期

机译：基于信号表示的解析器，用于从Web提取基于文本的信息
4. Extracting Topic Maps from Web Pages by Web Link Structure and Content [C] . Motohiro Mase, Seiji Yamada, Katsumi Nitta IEEE Congress on Evolutionary Computation . 2008

机译：通过Web链接结构和内容从网页中提取主题映射
5. Targeted web mining using Guided Tree Based Content Extractor Algorithm (GTCEA) [D] . Mahat, Puspa Raj. 2007

机译：使用基于引导树的内容提取器算法（GTCEA）进行定向Web挖掘
6. An effective content-based image retrieval technique for image visuals representation based on the bag-of-visual-words model [O] . Safia Jabeen, Zahid Mehmood, Toqeer Mahmood, -1

机译：基于视觉袋模型的基于内容的有效图像检索技术
7. Extracting Semantic Structure of Web Documents Using Content and Visual Information [O] . Rupesh Mehta, Pabitra Mitra, Harish Karnick 2005

机译：利用内容和视觉信息提取Web文档的语义结构

Extracting Content Structure for Web Pages Based on Visual Representation

摘要

著录项

相似文献

相关主题

期刊订阅