声明
ACKNOWLEDGEMENT
ABSTRACT
摘要
Table of Contents
1 Introduction
1.1 Motivation
1.2 Goal
1.3 Trends in the Field of lnformation Extraction
2 Information Extraction Techniques
2.1 Pattern-Based Extraction of Named Entities
2.1.1 Named Entity Recognition
2.1.2 Entity Relation Detection
2.2 Regular Expression
2.3 Analyses of HTML Documents
2.3.1 Document Code Modeling
2.3.2 HTML Code Analysis
2.3.3 Conceptual Modeling
2.3.4 Visual Analysis of HTML Documents
3 Visual Modeling Approach to Information Extraction
3.1 Visual Information Analysis
3.1.1 Page Layout Model
3.1.2 Text Attribute Model
3.1.3 Logical Document Structure
3.2 Information Extraction from the Logical Structure
4 Design and Implementation Pattern-Based IE System
4.1 Technologies Analysis
4.1.1 HTML Retrieval API
4.1.2 HTML Parser API
4.2 System Design
4.2.1 User Interaction
4.2.2 User Interface Interaction
4.2.3 Download HTML Documents
4.2.4 Extract Data
4.2.5 Interaction Process
4.3 System Implementation
4.3.1 Generate Search URL
4.3.2 Download HTML Files
4.3.3 Implement Data Extraction
4.3.4 Global Interaction
4.4 Input/Output of Pattern-Based IE System
5 Evaluation of Resuits
6 Conclusion and Future Possibilities
6.1 Summary
6.2 Future Possibilities
References
Curriculum Vitae of Author
学位论文数据集