对Web网页文本信息自动提取技术提供了一个较为全面的综述.通过分析在这个领域常用到的三种信息提取模型和四类机器学习算法的发展,较为全面地阐述了当前主流的网页文本信息自动提取技术,对比了各种方法的应用范围,最后对于该领域当前的热点问题和发展趋势进行了展望.%This paper supplied a comprehensive survey of the text information extraction from Web page. By presenting and analyzing the development of three kinds of extraction modules and four types of the learning algorithms used in this area, it comprehensively surveyed the relative technologies of the text information extraction from Web page, and analyzed the application scenarios of different technologies. Finally, discussed the difficulties and the trend of the development of this area.
展开▼