首页> 外文会议>Advanced data mining and applications >Using Data Mining Methods to Predict Personally Identifiable Information in Emails
【24h】

Using Data Mining Methods to Predict Personally Identifiable Information in Emails

机译:使用数据挖掘方法预测电子邮件中的个人身份信息

获取原文
获取原文并翻译 | 示例

摘要

Private information management and compliance are important issues nowadays for most of organizations. As a major communication tool for organizations, email is one of the many potential sources for privacy leaks. Information extraction methods have been applied to detect private information in text files. However, since email messages usually consist of low quality text, information extraction methods for private information detection may not achieve good performance. In this paper, we address the problem of predicting the presence of private information in email using data mining and text mining methods. Two prediction models are proposed. The first model is based on association rules that predict one type of private information based on other types of private information identified in emails. The second model is based on classification models that predict private information according to the content of the emails. Experiments on the Enron email dataset show promising results.
机译:如今,对于大多数组织而言,私人信息管理和合规性是重要的问题。作为组织的主要通信工具,电子邮件是隐私泄露的许多潜在来源之一。信息提取方法已应用于检测文本文件中的私人信息。但是,由于电子邮件通常包含低质量的文本,因此用于私人信息检测的信息提取方法可能无法获得良好的性能。在本文中,我们解决了使用数据挖掘和文本挖掘方法预测电子邮件中私人信息的存在的问题。提出了两种预测模型。第一模型基于关联规则,该关联规则基于电子邮件中标识的其他类型的私人信息来预测一种类型的私人信息。第二个模型基于分类模型,该分类模型根据电子邮件的内容预测私人信息。在Enron电子邮件数据集上进行的实验显示出令人鼓舞的结果。

著录项

  • 来源
  • 会议地点 Chengdu(CN);Chengdu(CN)
  • 作者单位

    Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada;

    Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada;

    Department of Geomatics Engineering, University of Calgary, Calgary, Alberta, Canada;

    Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada;

    Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada;

    Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TP311.13;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号