首页> 外文会议>International Conference on Natural Language Processing and Knowledge Engineering >Data Extraction from Web Forums Based on Similarity of Page Layout
【24h】

Data Extraction from Web Forums Based on Similarity of Page Layout

机译:基于页面布局的相似性的Web论坛数据提取

获取原文

摘要

Web forums contain a wealth of information resources. Forum data can be widely used in areas such as Internet community mining, information retrieval and public opinion analysis and so on. This paper solves the problems of what should be extracted and how to extract from the web forums. Aimed at the limitation of current methods to extract data from web forums, an automated method is proposed to extract metadata from web forum pages. The method processes in two steps. We firstly recognizes the topic-block by making full use of the special layout of the web forum pages, then extract metadata from the topic-block by making use of statistical regularity of the metadata, the whole process done without manual work. Experimental results show that this method performs well both in adjustability and accuracy.
机译:网络论坛包含丰富的信息资源。论坛数据可以广泛应用于互联网社区挖掘,信息检索和公众舆论分析等领域。本文解决了应该提取的问题以及如何从Web论坛中提取的问题。旨在限制当前从Web论坛中提取数据的方法,提出了一种自动化方法,以从Web论坛页面中提取元数据。该方法处理分两步。我们首先通过充分利用Web论坛页面的特殊布局来认识到主题块,然后通过利用元数据的统计规律性来从主题块中提取元数据,整个过程在没有手动工作的情况下完成。实验结果表明,该方法在可调节性和准确性下进行良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号