首页> 外文会议>19th international world wide web conference 2010 >Inferring Relevant Social Networks from Interpersonal Communication
【24h】

Inferring Relevant Social Networks from Interpersonal Communication

机译:从人际交往中推断相关的社交网络

获取原文

摘要

Researchers increasingly use electronic communication data to construct and study large social networks, effectively inferring unobserved ties (e.g. I is connected to j) from observed communication events (e.g. I emails j). Often overlooked, however, is the impact of tie definition on the corresponding network, and in turn the relevance of the inferred network to the research question of interest. Here we study the problem of network inference and relevance for two email data sets of different size and origin. In each case, we generate a family of networks parameterized by a threshold condition on the frequency of emails exchanged between pairs of individuals. After demonstrating that different choices of the threshold correspond to dramatically different network structures, we then formulate the relevance of these networks in terms of a series of prediction tasks that depend on various network features. In general, we find: a) that prediction accuracy is maximized over a non-trivial range of thresholds corresponding to 5-10 reciprocated emails per year; b) that for any prediction task, choosing the optimal value of the threshold yields a sizable (~ 30%) boost in accuracy over naieve choices; and c) that the optimal threshold value appears to be (somewhat surprisingly) consistent across data sets and prediction tasks. We emphasize the practical utility in defining ties via their relevance to the prediction task(s) at hand and discuss implications of our empirical results.
机译:研究人员越来越多地使用电子通信数据来构建和研究大型社交网络,从而有效地从观察到的通信事件(例如,我通过电子邮件发送给j)中推断出未观察到的联系(例如,我已连接到j)。然而,联系定义对相应网络的影响经常被忽略,进而推论网络与感兴趣的研究问题的相关性。在这里,我们研究了两个大小和来源不同的电子邮件数据集的网络推断和相关性问题。在每种情况下,我们都会生成一个网络家族,该网络家族通过在成对的个人之间交换的电子邮件的频率上的阈值条件进行参数化。在证明阈值的不同选择对应于截然不同的网络结构之后,我们然后根据一系列依赖于各种网络功能的预测任务来公式化这些网络的相关性。通常,我们发现:a)在非平凡的阈值范围(对应于每年5-10封往复电子邮件)中,预测准确性达到了最大化; b)对于任何预测任务,选择阈值的最佳值都会比单纯的选择产生相当大的(〜30%)准确性; c)最佳阈值在数据集和预测任务之间似乎是一致的(有点令人惊讶)。我们通过定义联系与手头预测任务的相关性来强调定义联系的实际效用,并讨论我们的经验结果的含义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号