首页>
外国专利>
EXTRACTING STRUCTURED DATA FROM WEB FORUMS
EXTRACTING STRUCTURED DATA FROM WEB FORUMS
展开▼
机译:从网上论坛中提取结构化数据
展开▼
页面导航
摘要
著录项
相似文献
摘要
The web forum data extraction technique is designed for the structured data extraction of data on web forums using both page-level information and site-level knowledge. To do this, the technique finds the kinds of page objects a forum site has, which object a page belongs to, and how different page objects are connected with each other. This information can be obtained by re-constructing the sitemap of the target forum which is based on a Data Object Model of the target forum. The web forum data extraction technique collects three kinds of evidence for data extraction: 1) inner-page features which cover both semantic and layout information on an individual page; 2) inter-vertex features which describe linkage-related observations; and 3) inner-vertex features which characterize interrelationships among pages in one vertex. The technique employs Markov Logic Networks to combine the types of evidence statistically for inference and thereby can extract the desired structures.
展开▼