首页> 外国专利> WEB INFORMATION COLLECTION DEVICE, WEB CRAWLER PROGRAM AND WEB INFORMATION COLLECTION METHOD

WEB INFORMATION COLLECTION DEVICE, WEB CRAWLER PROGRAM AND WEB INFORMATION COLLECTION METHOD

机译：Web信息收集装置，WebCrawler程序和Web信息收集方法

页面导航

摘要
著录项
相似文献

摘要

PPROBLEM TO BE SOLVED: To provide a Web information collection device, a Web crawler program and a Web information collection method, not requiring manual production and update of learning data, automatically collecting only a Web page that is a target, and not collecting information that is noise to the utmost. PSOLUTION: This Web information collection device 2 has a control part 4 cyclically acquiring Web information from Web servers 3a, 3b, 3c, deciding whether the Web information is target information related to a specific field or non-target information, acquiring second Web information about a link source to extract a word included in a text information portion, learning cumulative frequency of each word in each case of the target information or the non-target information, calculating priority by appearance frequency of the extracted word and the learned cumulative frequency, and determining a link destination preferentially accessed by largeness of the priority, and collects the Web information while tracing link information. PCOPYRIGHT: (C)2006,JPO&NCIPI

机译：

要解决的问题：提供一种Web信息收集设备，Web搜寻器程序和Web信息收集方法，不需要手动生成和更新学习数据，而仅自动收集作为目标的Web页面，而无需收集收集最有害的信息。

解决方案：该Web信息收集设备2具有控制部分4，该控制部分4周期性地从Web服务器3a，3b，3c获取Web信息，从而确定该Web信息是与特定字段有关的目标信息还是与非目标信息有关的信息。有关链接源的Web信息，用于提取文本信息部分中包含的单词，在目标信息或非目标信息的每种情况下学习每个单词的累积频率，通过提取的单词的出现频率和学习到的累积值来计算优先级频率，并确定优先级高优先访问的链接目的地，并在跟踪链接信息的同时收集Web信息。

版权：（C）2006，JPO＆NCIPI 展开▼

著录项

公开/公告号JP2005346598A

专利类型
公开/公告日2005-12-15

原文格式PDF
申请/专利权人 SANGAKU RENKEI KIKO KYUSHU:KK;
展开▼

申请/专利号JP20040168034
发明设计人 HIROKAWA SACHIO;MATSUNAGA YOSHIHIRO;NOGUCHI MASATO;
展开▼

申请日2004-06-07
分类号G06F17/30;G06F13/00;G06N3/08;
国家 JP
入库时间 2022-08-21 21:53:17

相似文献

专利
外文文献
中文文献