首页> 中文期刊> 《计算机工程与设计》 >无监督的互联网事件抽取框架

无监督的互联网事件抽取框架

         

摘要

为高效便捷地获取互联网上发布的真实事件信息,提出了一种无监督的互联网事件抽取框架.该框架利用DOM树模型的平行结构特性对表格页面进行事件抽取,并以表格页面抽取的事件作为种子采总结详情页面的对应模式,进一步使用总结的模式在详情页面中抽取.在大量网站页面中应用该框架,并将抽取结果与常用的包装器生成算法进行比较,结果表明了该框架的有效性以及在详情页面中的抽取质量优于包装器算法.%To acquire real event information published to intemet effectively and easily, an unsupervised web event extraction framework is proposed. This framework extracts events from table WebPages by using DOM' s parallel structure, the events extracted from table WebPages are used as seeds to summary corresponding patterns from detail WebPages, then patterns summarized are used to further extract events from detail WebPages. Masses ofwebsites are used to verify this framework and the result ofextraetion, which is eompared to common wrapper-generation algorithm, indicated that this framework is feasible and better than wrapper-generation algorithm in quality of detail webpage extraction.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号