首页> 外文会议>IEEE International Conference on Data Mining Workshops >Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach
【24h】

Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach

机译:将亲自可识别的信息从暗网上链接到表面网:深度实体解析方法

获取原文

摘要

The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals' PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores.
机译:互联网用户的信息隐私已成为一个主要的社会问题。在线服务的快速增长增加了未经授权访问的危险人群的个人可识别信息(PII)的风险,他们不知道其PII曝光。要主动识别在线风险群体,并提高隐私意识,这对互联网进行全面隐私风险评估至关重要。目前的隐私风险评估研究仅限于地表网或暗网中的单个平台。全面的隐私风险评估需要将暴露的PII匹配在地表网和暗网上的异构在线平台上。但是,由于每个平台中PII记录的不完整性和不准确性,将暴露的PII与用户联系起来是一个非琐碎的任务。虽然实体分辨率(ER)技术可用于促进此任务,但它们通常需要ad-hoc,手动规则开发和功能工程。最近,基于深度学习(DL)的ER通过自动提取来自不完整或不准确的记录的突出特征来实现手动实体匹配规则。在这项研究中,我们通过基于DL的ER方法提升了现有的隐私风险评估,即多语境注意(MCA),以全面评估暗网和表面网的不同在线平台上的个人的PII曝光。对基准ER模型的评估表明了MCA的功效。在暗网络中使用MCA在Data Data Breacal受害者样本中,我们能够识别表面Web平台上的4.3%的受害者,并计算其隐私风险分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号