首页> 外国专利> EXTRACTING STRUCTURED DATA FROM WEB FORUMS

EXTRACTING STRUCTURED DATA FROM WEB FORUMS

机译:从网上论坛中提取结构化数据

摘要

The web forum data extraction technique is designed for the structured data extraction of data on web forums using both page-level information and site-level knowledge. To do this, the technique finds the kinds of page objects a forum site has, which object a page belongs to, and how different page objects are connected with each other. This information can be obtained by re-constructing the sitemap of the target forum which is based on a Data Object Model of the target forum. The web forum data extraction technique collects three kinds of evidence for data extraction: 1) inner-page features which cover both semantic and layout information on an individual page; 2) inter-vertex features which describe linkage-related observations; and 3) inner-vertex features which characterize interrelationships among pages in one vertex. The technique employs Markov Logic Networks to combine the types of evidence statistically for inference and thereby can extract the desired structures.
机译:Web论坛数据提取技术旨在使用页面级信息和站点级知识对Web论坛上的数据进行结构化数据提取。为此,该技术查找论坛站点具有的页面对象的种类,页面属于哪个对象以及不同的页面对象如何相互连接。可以通过基于目标论坛的数据对象模型重建目标论坛的站点地图来获取此信息。网络论坛数据提取技术收集了三种数据提取证据:1)内页功能,可覆盖单个页面上的语义和布局信息; 2)顶点间特征,描述与链接相关的观察; 3)内部顶点特征,特征在于一个顶点中页面之间的相互关系。该技术采用马尔可夫逻辑网络将统计的证据类型进行组合以进行推理,从而可以提取所需的结构。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号