首页> 外文学位 >Structural mining of large-scale behavioral data from the Internet.
【24h】

Structural mining of large-scale behavioral data from the Internet.

机译:来自Internet的大规模行为数据的结构化挖掘。

获取原文
获取原文并翻译 | 示例

摘要

As the Internet becomes ever more pervasive in the lives of hundreds of millions of people, our understanding of its physical structure has outpaced our understanding of the dynamic patterns of traffic generated by its users. This work aims to develop a better understanding of the structure of Internet traffic in a manner consistent with individual privacy and computational constraints. I first examine network flow data from the Internet2 network, using it to form "behavioral networks" based on the flows attributable to specific network applications. The heavy-tailed distributions associated with these networks suggest unbounded variance and poorly defined means in distributions of user behavior. However, a novel application of hierarchical clustering to similarity data derived from these networks makes it possible to classify network applications robustly based on their observed behavior. I then focus on Web traffic, using a large collection of HTTP request data to build a weighted subset of the Web graph. Analysis of this weighted graph reveals more heavy-tailed distributions and the presence of a large body of stationary traffic. The traffic data are also shown to contradict key assumptions of the random surfer model used by PageRank. I conclude with the development of ABC, an behaviorally plausible agent-based model of Web traffic that incorporates backtracking, bookmarks, and a sense of topical locality. The ABC model is shown to approximate real user activity more accurately than PageRank on both artificial and empirically generated graphs.
机译:随着Internet在成千上万人的生活中变得越来越普遍,我们对它的物理结构的理解已经超过了对它的用户产生的动态流量模式的理解。这项工作旨在以与个人隐私和计算约束一致的方式更好地理解Internet流量的结构。我首先检查来自Internet2网络的网络流数据,并使用该数据基于可归因于特定网络应用程序的流来形成“行为网络”。与这些网络相关的繁重分布表明用户行为分布无限制的方差和定义不明确的均值。但是,将分层聚类应用于从这些网络派生的相似性数据的新颖应用程序使得可以根据网络应用程序的观察行为对其进行健壮分类。然后,我将重点放在Web流量上,它使用大量HTTP请求数据来构建Web图的加权子集。对这个加权图的分析显示出更多的重尾分布,并且存在大量的固定交通。流量数据还显示出与PageRank使用的随机冲浪者模型的关键假设相矛盾。我以ABC的开发作为结束,ABC是一种行为可行的基于代理的Web流量模型,其中包含回溯,书签和主题局部性的感觉。在人工图和凭经验生成的图上,ABC模型都显示出比PageRank更准确地近似实际用户活动。

著录项

  • 作者

    Meiss, Mark.;

  • 作者单位

    Indiana University.;

  • 授予单位 Indiana University.;
  • 学科 Web Studies.;Computer Science.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 321 p.
  • 总页数 321
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号