首页> 外文会议>AAAI Conference on Artificial Intelligence >Improving Learning in Networked Data by Combining Explicit and Mined Links
【24h】

Improving Learning in Networked Data by Combining Explicit and Mined Links

机译:通过组合显式和挖掘链接来改善网络数据学习

获取原文

摘要

This paper is about using multiple types of information for classification of networked data in a semi-supervised setting: given a fully described network (nodes and edges) with known labels for some of the nodes, predict the labels of the remaining nodes. One method recently developed for doing such inference is a guilt-by-association model. This method has been independently developed in two different settings-relational learning and semi-supervised learning. In relational learning, the setting assumes that the networked data has explicit links such as hyperlinks between web-pages or citations between research papers. The semi-supervised setting assumes a corpus of non-relational data and creates links based on similarity measures between the instances. Both use only the known labels in the network to predict the remaining labels but use very different information sources. The thesis of this paper is that if we combine these two types of links, the resulting network will carry more information than either type of link by itself. We test this thesis on six benchmark data sets, using a within-network learning algorithm, where we show that we gain significant improvements in predictive performance by combining the links. We describe a principled way of combining multiple types of edges with different edge-weights and semantics using an objective graph measure called node-based assortativity. We investigate the use of this measure to combine text-mined links with explicit links and show that using our approach significantly improves performance of our classifier over naively combining these two types of links.
机译:本文是关于在半监控设置中使用多种类型的网络数据分类:给定具有用于一些节点的已知标签的完全描述的网络(节点和边),预测剩余节点的标签。最近开发用于这样的推断的一种方法是逐个关联模型。这种方法已在两个不同的设置关系学习和半监督学习中独立开发。在关系学习中,该设置假设网络数据具有明确的链接,例如网页之间的超链接或研究论文之间的引文。半监控设置假定非关系数据的语料库,并根据实例之间的相似度测量创建链接。两者都仅使用网络中的已知标签来预测剩余的标签,但使用非常不同的信息源。本文的论文是,如果我们组合这两种类型的链接,所得到的网络将携带比自身的链路类型更多的信息。我们在六个基准数据集上测试本论文,使用网络内学习算法,我们认为我们通过组合链接来获得预测性能的显着改进。我们使用称为基于节点的assortivity的客观图测量来描述用不同边缘权重和语义结合多种类型的边缘的原则方式。我们调查使用这一措施与明确的链接相结合的文本开采链接,并显示使用我们的方法显着提高了我们的分类器的性能,而不是天然地结合这两种类型的链接。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号