首页> 外国专利> System, method, and service for using a focused random walk to produce samples on a topic from a collection of hyper-linked pages

System, method, and service for using a focused random walk to produce samples on a topic from a collection of hyper-linked pages

机译:使用聚焦随机游走从超链接页面集合中生成主题样本的系统,方法和服务

摘要

A focused random walk system produces samples of on-topic pages from a collection of hyper-linked pages such as Web pages. The focused random walk system utilizes a focused random walk to produce a focused sample, which is a random sample of Web pages focused on a topic. The focused random walk system uniformly samples pages iteratively, where each iteration follows a random link from a union of the in-links and out-links of a page. The system then classifies this randomly selected link to determine whether the page is on-topic. The random walk sampling process could comprise a hard-focus method that selects only on-topic pages at each step of the focused random walk, or a soft-focus method that allows limited divergence to off-topic pages.
机译:聚焦的随机游走系统从超链接页面(例如Web页面)的集合中生成主题页面的样本。聚焦随机游走系统利用聚焦随机游走产生聚焦样本,该样本是聚焦于主题的网页的随机样本。聚焦的随机游走系统对页面进行迭代迭代采样,其中每次迭代都遵循页面内链接和外链接的并集的随机链接。然后,系统对该随机选择的链接进行分类,以确定页面是否为主题。随机游走采样过程可以包括在聚焦的随机游走的每个步骤仅选择主题页面上的硬焦点方法,或允许有限地偏离主题外页面的软焦点方法。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号