InForCE: Forum data crawling with information extraction

机译：InForCE：论坛数据爬网和信息提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Forum data acquisition is the prerequisite of forum data analysis, such as opinion analysis, on-line advertisement, and so on. Since the structure of forum data usually has casual relationships with the page structure, effective forum data acquisition requires the integration of Web pages crawling and information extraction. In this paper, we propose a system InForCE for this purpose. The system includes two parts. First, we download Web pages from different forums and generate HTML documents. Second, structured data are extracted from HTML documents in the light of user requiremnts. During the extraction process, a novel algorithm has been proposed to transform user requirement into XSLT automatically. Our experimental results show that structured data extraction is feasible and efficient.

机译：论坛数据获取是论坛数据分析（如意见分析，在线广告等）的前提。由于论坛数据的结构通常与页面结构有偶然的关系，因此有效的论坛数据获取需要集成Web爬网和信息提取。在本文中，我们为此目的提出了一个系统InForCE。该系统包括两个部分。首先，我们从不同的论坛下载网页并生成HTML文档。其次，根据用户要求从HTML文档中提取结构化数据。在提取过程中，提出了一种新颖的算法将用户需求自动转换为XSLT。我们的实验结果表明，结构化数据提取是可行和高效的。

著录项

来源
《Proceedings of 2010 4th International Universal Communication Symposium》|2010年|p.367-373|共7页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类通信系统（传输系统）;
关键词

相似文献

外文文献
中文文献
专利

1. Enabling maps/location searches on mobile devices: constructing a POI database via focused crawling and information extraction [J] . Chuang Hsiu-Min, Chang Chia-Hui, Kao Ting-Yao, International Journal of Geographical Information Science . 2016,第7a8期

机译：在移动设备上启用地图/位置搜索：通过重点抓取和信息提取来构建POI数据库
2. OXPath: A language for scalable data extraction, automation, and crawling on the deep web [J] . Tim Furche, Georg Gottlob, Giovanni Grasso, The VLDB journal . 2013,第1期

机译：OXPath：一种用于可扩展的数据提取，自动化和在深度网络上进行爬网的语言
3. OXPath: A language for scalable data extraction, automation, and crawling on the deep web [J] . Tim Furche, Georg Gottlob, Giovanni Grasso, The VLDB Journal . 2013,第1期

机译：OXPath：一种用于可扩展的数据提取，自动化和在深度网络上进行爬网的语言
4. InForCE: Forum data crawling with information extraction [C] . {missing} International University Communication Symposium . 2010

机译：Inforce：论坛数据爬行信息提取
5. A model for assessing and improving quality of textual data from product-related discussion forums. [D] . Ilikchyan, Armen. 2015

机译：用于评估和改善与产品相关的论坛中的文本数据质量的模型。
6. Vigi4Med Scraper: A Framework for Web Forum Structured Data Extraction and Semantic Representation [O] . Bissan Audeh, Michel Beigbeder, Antoine Zimmermann, -1

机译：Vigi4Med Scraper：Web论坛结构化数据提取和语义表示的框架
7. Board Forum Crawling: A Web Crawling Method for Web Forum [O] . Yan Guo, Kui Li, Kai Zhang, 2006

机译：Board Forum Crawling：Web论坛的Web爬行方法

InForCE: Forum data crawling with information extraction

摘要

著录项

相似文献

相关主题

期刊订阅