首页> 外文会议>IEEE International Conference on Social Computing >Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph
【24h】

Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

机译:使用文本分析来了解万维网的结构和动态作为多关系图

获取原文

摘要

A representation of the World Wide Web as a directed graph, with vertices representing web pages and edges representing hypertext links, underpins the algorithms used by web search engines today. However, this representation involves a key oversimplification of the true complexity of the Web: an edge in the traditional Web graph represents only the existence of a byperlink; information on the context (e.g., informational, adversarial, commercial, spam) behind the hyperlink is absent. In this work-in-progress paper, we describe an ongoing collaborative project between two teams, one specializing in network science, and analysis and the other specializing in text analysis and machine learning, to address this oversimplification. Using tech-niques in natural language processing, text mining and machine learning to extract relevant features of hyperlinks and classify them into one of several types, this undertaking builds and analyzes a multi-relational web graph. A key aspect of this work is that the multi-relational graph emerges naturally from the data instead of being based on an imposed classification of the hyperlinks.
机译:万维网作为定向图的表示,顶点表示表示超文本链接的网页和边的顶点,基本于今天Web搜索引擎使用的算法。然而,此代表涉及到网络的真实复杂的一个关键过于简单化:在传统的Web图中的边仅代表byperlink的存在;超链接背后的上下文(例如,信息,对抗,商业,垃圾邮件)的信息。在这个工作正在进行文章中,我们介绍了正在进行的合作项目两支球队,一个专门从事网络科学,分析和其他专业文本分析和机器学习之间,来解决这个过于简单化。在自然语言处理,文本挖掘和机器学习技术使用,niques提取超链接的相关要素和它们分为几种类型中的一种,这一承诺建立和分析一个多关系网络图。这项工作的一个关键方面是,多关系图自然地从数据中出现,而不是基于超链接的强加分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号