首页> 外国专利> SELF-LEARNING BASED CRAWLING AND RULE-BASED DATA MINING FOR AUTOMATIC INFORMATION EXTRACTION

SELF-LEARNING BASED CRAWLING AND RULE-BASED DATA MINING FOR AUTOMATIC INFORMATION EXTRACTION

机译:基于自学习的爬网和基于规则的数据挖掘,用于自动信息提取

摘要

Methods and Systems for automatic information extraction by performing self-learning crawling and rule-based data mining is provided. The method determines existence of crawl policy within input information and performs at least one of front-end crawling, assisted crawling and recursive crawling. Downloaded data set is pre-processed to remove noisy data and subjected to classification rules and decision tree based data mining to extract meaningful information. Performing crawling techniques leads to smaller relevant datasets pertaining to a specific domain from multi-dimensional datasets available in online and offline sources.
机译:提供了通过执行自学习爬网和基于规则的数据挖掘来自动提取信息的方法和系统。该方法确定输入信息内的爬网策略的存在,并执行前端爬网,辅助爬网和递归爬网中的至少一种。对下载的数据集进行预处理以除去噪声数据,并进行分类规则和基于决策树的数据挖掘以提取有意义的信息。执行爬网技术会从在线和离线资源中提供的多维数据集中得到与特定领域相关的较小的相关数据集。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号