首页> 外文期刊>Information Forensics and Security, IEEE Transactions on >Where You Are Is Who You Are: User Identification by Matching Statistics
【24h】

Where You Are Is Who You Are: User Identification by Matching Statistics

机译:您所在的位置是您的身份:通过匹配统计信息进行用户识别

获取原文
获取原文并翻译 | 示例

摘要

Most users of online services have unique behavioral or usage patterns. These behavioral patterns can be exploited to identify and track users by using only the observed patterns in the behavior. We study the task of identifying users from statistics of their behavioral patterns. In particular, we focus on the setting in which we are given histograms of users’ data collected during two different experiments. We assume that, in the first data set, the users’ identities are anonymized or hidden and that, in the second data set, their identities are known. We study the task of identifying the users by matching the histograms of their data in the first data set with the histograms from the second data set. In recent works, the optimal algorithm for this user identification task is introduced. In this paper, we evaluate the effectiveness of this method on three different types of data sets with up to 50 000 users, and in multiple scenarios. Using data sets such as call data records, web browsing histories, and GPS trajectories, we demonstrate that a large fraction of users can be easily identified given only histograms of their data; hence, these histograms can act as users’ fingerprints. We also verify that simultaneous identification of users achieves better performance compared with one-by-one user identification. Furthermore, we show that using the optimal method for identification indeed gives higher identification accuracy than the heuristics-based approaches in the practical scenarios. The accuracy obtained under this optimal method can thus be used to quantify the maximum level of user identification that is possible in such settings. We show that the key factors affecting the accuracy of the optimal identification algorithm are the duration of the data collection, the number of users in the anonymized data set, and the resolution of the data set. We also analyze the effectiveness of
机译:大多数在线服务用户具有独特的行为或使用方式。通过仅在行为中使用观察到的模式,可以利用这些行为模式来识别和跟踪用户。我们研究了根据用户行为模式统计来识别用户的任务。特别是,我们关注的是在设置中获得两个不同实验期间收集到的用户数据的直方图。我们假定在第一个数据集中,用户的身份被匿名或隐藏,并且在第二个数据集中,用户的身份是已知的。我们研究通过将第一数据集中的数据直方图与第二数据集中的直方图进行匹配来识别用户的任务。在最近的工作中,介绍了用于此用户识别任务的最佳算法。在本文中,我们评估了这种方法在多达5万个用户的三种不同类型的数据集上以及在多种情况下的有效性。使用呼叫数据记录,Web浏览历史记录和GPS轨迹等数据集,我们证明,仅给出数据的直方图,就可以轻松识别很大一部分用户;因此,这些直方图可以充当用户的指纹。我们还验证了与同时一对一的用户标识相比,用户的同时标识具有更好的性能。此外,我们表明,在实际情况下,与基于启发式的方法相比,使用最优的方法进行识别确实可以提供更高的识别精度。因此,在这种最佳方法下获得的准确度可用于量化在这种设置下可能达到的最大用户识别级别。我们表明影响最佳识别算法准确性的关键因素是数据收集的持续时间,匿名数据集中的用户数量以及数据集的分辨率。我们还分析了

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号