首页> 外文学位 >Multi-stage modeling of HTML documents.
【24h】

Multi-stage modeling of HTML documents.

机译:HTML文档的多阶段建模。

获取原文
获取原文并翻译 | 示例

摘要

The goal of this thesis is to first give the reader an accurate picture of several models of both information discovery and extraction within the World Wide Web and how those two processes are becoming increasingly interrelated in overall information analysis. Furthermore, it will investigate how a sophisticated analysis of visual documents, such as those on the Web, is becoming increasingly important in both finding and understanding the context of document information. The thesis presents several problems within document analysis and then tries to approximate solutions to those problems in a general analysis framework, which is implemented in a prototype application. Finally, an instance of the framework is used to demonstrate its own practicality by accumulating statistics on features of web documents such as script and style usage that are only discovered by a deeper document analysis.
机译:本文的目的是首先为读者​​提供有关万维网内信息发现和提取的几种模型的准确图片,以及这两种过程在整体信息分析中如何变得越来越相互关联。此外,它将研究如何对可视文件(例如Web上的可视文件)进行复杂的分析在查找和理解文档信息的上下文中变得越来越重要。本文提出了文档分析中的几个问题,然后尝试在通用分析框架中近似解决这些问题的解决方案,该框架是在原型应用程序中实现的。最后,该框架的一个实例用于通过累积有关Web文档功能的统计信息(例如脚本和样式使用情况)来证明其实用性,而这些统计信息只有通过更深入的文档分析才能发现。

著录项

  • 作者

    Levering, Ryan Reed.;

  • 作者单位

    State University of New York at Binghamton.;

  • 授予单位 State University of New York at Binghamton.;
  • 学科 Computer science.;Information science.
  • 学位 M.S.
  • 年度 2004
  • 页码 76 p.
  • 总页数 76
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 水产、渔业;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利