首页> 外文期刊>Proceedings of the National Academy of Sciences of the United States of America >Growing and navigating the small world Web by local content
【24h】

Growing and navigating the small world Web by local content

机译:通过本地内容发展和浏览小型世界Web

获取原文
获取原文并翻译 | 示例
       

摘要

Can we model the scale-free distribution of Web hypertext degree under realistic assumptions about the behavior of page authors? Can a Web crawler efficiently locate an unknown relevant page? These questions are receiving much attention due to their potential impact for understanding the structure of the Web and for building better search engines. Here I investigate the connection between the linkage and content topology of Web pages. The relationship between a text-induced distance metric and a link-based neighborhood probability distribution displays a phase transition between a region where linkage is not determined by content and one where linkage decays according to a power law. This relationship is used to propose a Web growth model that is shown to accurately predict the distribution of Web page degree, based on textual content and assuming only local knowledge of degree for existing pages. A qualitatively similar phase transition is found between linkage and semantic distance, with an exponential decay tail. Both relationships suggest that efficient paths can be discovered by decentralized Web navigation algorithms based on textual and/or categorical cues.
机译:我们可以在关于页面作者行为的现实假设下对Web超文本度的无标度分布进行建模吗? Web搜寻器可以有效地定位未知的相关页面吗?这些问题由于对理解Web的结构和构建更好的搜索引擎具有潜在的影响而受到了广泛的关注。在这里,我研究网页的链接和内容拓扑之间的联系。文本引起的距离度量和基于链接的邻域概率分布之间的关系显示了一个区域,在该区域中,链接不是由内容决定的,而链接是根据幂定律衰减的。此关系用于建议一个Web增长模型,该模型可基于文本内容并仅假设现有页面的度数知识来准确预测Web度数的分布。在链接和语义距离之间发现了定性相似的相变,具有指数衰减尾巴。两种关系都表明,可以通过基于文本和/或类别提示的分散式Web导航算法发现有效的路径。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号