Data Extraction from Web Forums Based on Similarity of Page Layout

机译：基于页面布局的相似性的Web论坛数据提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web forums contain a wealth of information resources. Forum data can be widely used in areas such as Internet community mining, information retrieval and public opinion analysis and so on. This paper solves the problems of what should be extracted and how to extract from the web forums. Aimed at the limitation of current methods to extract data from web forums, an automated method is proposed to extract metadata from web forum pages. The method processes in two steps. We firstly recognizes the topic-block by making full use of the special layout of the web forum pages, then extract metadata from the topic-block by making use of statistical regularity of the metadata, the whole process done without manual work. Experimental results show that this method performs well both in adjustability and accuracy.

机译：网络论坛包含丰富的信息资源。论坛数据可以广泛应用于互联网社区挖掘，信息检索和公众舆论分析等领域。本文解决了应该提取的问题以及如何从Web论坛中提取的问题。旨在限制当前从Web论坛中提取数据的方法，提出了一种自动化方法，以从Web论坛页面中提取元数据。该方法处理分两步。我们首先通过充分利用Web论坛页面的特殊布局来认识到主题块，然后通过利用元数据的统计规律性来从主题块中提取元数据，整个过程在没有手动工作的情况下完成。实验结果表明，该方法在可调节性和准确性下进行良好。

著录项

来源
《International Conference on Natural Language Processing and Knowledge Engineering》|2009年||共5页
会议地点
作者
Yun WANG; Bicheng LI; Chen LIN;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP312-53;
关键词
Web forum; Data extraction; Similarity;

机译：网络论坛;数据提取;相似性;

相似文献

外文文献
中文文献
专利

1. Web data extraction based on structural similarity [J] . Li Z, Ng WK, Sun AX Knowledge and information systems . 2005,第4期

机译：基于结构相似度的Web数据提取
2. Web data extraction based on structural similarity [J] . Zhao Li, Wee Keong Ng, Aixin Sun Knowledge and Information Systems . 2005,第4期

机译：基于结构相似度的Web数据提取
3. Layout-based computation of web page similarity ranks [J] . Bozkir Ahmet Selman, Sezer Ebru Akcapinar International journal of human-computer studies . 2018,第期

机译：基于布局的网页相似性等级
4. Data Extraction from Web Forums Based on Similarity of Page Layout [C] . Yun WANG, Bicheng LI, Chen LIN Proceedings of international conference on natural language processing and knowledge engineering . 2009

机译：基于页面布局相似性的Web论坛数据提取
5. Feature extraction and similarity-based analysis for proteome and genome databases. [D] . Ozturk, Ozgur. 2007

机译：蛋白质组和基因组数据库的特征提取和基于相似度的分析。
6. Vigi4Med Scraper: A Framework for Web Forum Structured Data Extraction and Semantic Representation [O] . Bissan Audeh, Michel Beigbeder, Antoine Zimmermann, -1

机译：Vigi4Med Scraper：Web论坛结构化数据提取和语义表示的框架
7. Vigi4Med Scraper: A Framework for Web Forum Structured Data Extraction and Semantic Representation [O] . Audeh, Bissan, Beigbeder, Michel, Zimmermann, Antoine, 2017

机译：Vigi4Med Scraper：Web论坛结构化数据提取和语义表示的框架

Data Extraction from Web Forums Based on Similarity of Page Layout

摘要

著录项

相似文献

相关主题

期刊订阅