首页> 外文期刊>Wuhan University Journal of Natural Sciences >A Classification Method for Web Information Extraction
【24h】

A Classification Method for Web Information Extraction

机译:Web信息提取的分类方法

获取原文
获取原文并翻译 | 示例
           

摘要

Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information . Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that ihc method has good performance and is superior to DOM-based method in information extraction.
机译:Web信息的提取被视为一种分类过程,提出了一种竞争性的分类方法来直接通过分类提取Web信息。 Web片段用三个通用功能表示,然后在这些功能的基础上定义片段之间的相似性。通过对信息模板中不同时隙的片段竞争,该方法将片段分类为时隙类并过滤出噪声信息。与基于规则的方法相比,需要注释的样本少得多,因此它具有很强的可移植性。实验表明,ihc方法具有良好的性能,在信息提取方面优于基于DOM的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号