【24h】

Improving Learning in Networked Data by Combining Explicit and Mined Links

机译:通过结合显式链接和挖掘链接来改善网络数据的学习

获取原文
获取原文并翻译 | 示例

摘要

This paper is about using multiple types of information for classification of networked data in a semi-supervised setting: given a fully described network (nodes and edges) with known labels for some of the nodes, predict the labels of the remaining nodes. One method recently developed for doing such inference is a guilt-by-association model. This method has been independently developed in two different settings-relational learning and semi-supervised learning. In relational learning, the setting assumes that the networked data has explicit links such as hyperlinks between web-pages or citations between research papers. The semi-supervised setting assumes a corpus of non-relational data and creates links based on similarity measures between the instances. Both use only the known labels in the network to predict the remaining labels but use very different information sources. The thesis of this paper is that if we combine these two types of links, the resulting network will carry more information than either type of link by itself. We test this thesis on six benchmark data sets, using a within-network learning algorithm, where we show that we gain significant improvements in predictive performance by combining the links. We describe a principled way of combining multiple types of edges with different edge-weights and semantics using an objective graph measure called node-based assortativity. We investigate the use of this measure to combine text-mined links with explicit links and show that using our approach significantly improves performance of our classifier over naively combining these two types of links.
机译:本文是关于在半监督的环境中使用多种类型的信息对网络数据进行分类的:给定一个描述完整的网络(节点和边缘),其中某些节点具有已知标签,则预测其余节点的标签。最近开发的用于进行这种推断的一种方法是内关联模型。该方法已在两种不同的设置中独立开发:关系学习和半监督学习。在关系学习中,该设置假定网络数据具有明确的链接,例如网页之间的超链接或研究论文之间的引用。半监督设置假定非关系数据的语料,并基于实例之间的相似性度量创建链接。两者都仅使用网络中的已知标签来预测剩余标签,但使用非常不同的信息源。本文的论点是,如果我们将这两种类型的链接组合在一起,那么所产生的网络将比任何一种链接本身携带更多的信息。我们使用网络内学习算法在六个基准数据集上测试了本文,结果表明,通过组合链接,可以大大提高预测性能。我们描述了一种使用称为基于节点的分类的客观图度量将具有不同边缘权重和语义的多种类型的边缘组合在一起的原则方法。我们调查了使用此方法将文本挖掘的链接与显式链接相结合的过程,并表明与单纯组合这两种类型的链接相比,使用我们的方法可显着提高分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号