Extracting the Latent Hierarchical Structure of Web Documents

机译：提取Web文档的潜在层次结构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The hierarchical structure of a document plays an important role in understanding the relationships between its contents. However, such a structure is not always explicitly represented in web documents through available html hierarchical tags. Headings however, are usually differentiated from 'normal' text in a document in terms of presentation thus providing an implicit structure discernable by a human reader. As such, an important pre-processing step for applications that need to operate on the hierarchical level is to extract the implicitly represented hierarchal structure. In this paper, an algorithm for heading detection and heading level detection which makes use of various visual presentations is presented. Results of evaluating this algorithm are also reported.

机译：文档的层次结构在理解其内容之间的关系方面发挥着重要作用。但是，通过可用的HTML分层标记，在Web文档中并不总是在Web文档中明确地表示这样的结构。然而，标题通常在呈现方面的文档中的“正常”文本不同，从而提供人类读者可辨别的隐式结构。因此，需要在分层级别操作的应用程序的重要预处理步骤是提取隐式表示的层次结构。在本文中，呈现了一种用于使用各种视觉演示的前进检测和前置电平检测算法。还报道了评估该算法的结果。

著录项

来源
《International Conference on Signal-Image Technology and Internet- Based Systems》|2009年||共9页
会议地点
作者
Michael A. El-Shayeb; Samhaa R. El-Beltagy; Ahmed Rafea;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Heading detection; Heading level detection; Document structure;

机译：前进检测;标题水平检测;文档结构;

相似文献

外文文献
中文文献
专利

1. Extracting Events from Web Documents for Social Media Monitoring Using Structured SVM [J] . Yoonjae CHOI, Pum-Mo RYU, Hyunki KIM, IEICE transactions on information and systems . 2013,第6期

机译：使用结构化SVM从Web文档中提取事件以进行社交媒体监视
2. Extracting Events from Web Documents for Social Media Monitoring Using Structured SVM [J] . Yoonjae CHOI, Pum-Mo RYU, Hyunki KIM, IEICE Transactions on Information and Systems . 2013,第6期

机译：使用结构化SVM从Web文档中提取事件以进行社交媒体监视
3. Subtopic mining using simple patterns and hierarchical structure of subtopic candidates from web documents [J] . Se-Jong Kim, Jong-Hyeok Lee Information Processing & Management . 2015,第6期

机译：使用Web文档中子主题候选者的简单模式和层次结构进行子主题挖掘
4. Extracting the Latent Hierarchical Structure of Web Documents [C] . Michael A. El-Shayeb, Samhaa R. El-Beltagy, Ahmed Rafea International Conference on Signal-Image Technology and Internet- Based Systems . 2009

机译：提取Web文档的潜在层次结构
5. Generating coherent extracts of single documents using latent semantic analysis. [D] . Miller, Tristan. 2003

机译：使用潜在语义分析生成单个文档的连贯摘要。
6. HOLON: extending Web document libraries via objects in order to support the health information infrastructure. Health Object Library Online. [O] . B. G. Silverman, P. Jones, C. Safran, 1998

机译：HOLON：通过对象扩展Web文档库以支持健康信息基础结构。在线运行状况对象库。
7. Hierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics [O] . behnam taheri khameneh, hamid shokrzadeh 2020

机译：用于发现潜在语义的Web文档中的分层模糊群集语义（HFCS）

Extracting the Latent Hierarchical Structure of Web Documents

摘要

著录项

相似文献

相关主题

期刊订阅