首页> 外文会议>International workshop on digital-forensics and watermarking >Anonymizing Temporal Phrases in Natural Language Text to be Posted on Social Networking Services
【24h】

Anonymizing Temporal Phrases in Natural Language Text to be Posted on Social Networking Services

机译:将自然语言文本中的时间短语匿名化将发布在社交网络服务上

获取原文

摘要

Time-related information in text posted on-line is one type of personal information targeted by attackers, one reason that sharing information online can be risky. Therefore, time information should be anonymized before it is posted on social networking services. One approach to anonymizing information is to replace sensitive phrases with anonymous phrases, but attackers can usually spot such anonymization due to its unnaturalness. Another approach is to detect temporal passages in the text, but removal of these passages can make the meaning of the text unnatural. We have developed an algorithm that can be used to anonymize time-related personal information by removing the temporal passages when doing so will not change the natural meaning of the message. The temporal phrases are detected by using machine-learned patterns, which are represented by a subtree of the sentence parsing tree. The temporal phrases in the parsing tree are distinguished from other parts of the tree by using temporal taggers integrated into the algorithm. In an experiment with 4008 sentences posted on a social network, 84.53 % of them were anonymized without changing their intended meaning. This is significantly better than the 72.88 % rate of the best previous temporal phrase detection algorithm. Of the learned patterns, the top ten most common ones were used to detect 87.78% the temporal phrases. This means that only some of the most common patterns can be used to the anonymize temporal phrases in most messages to be posted on an SNS. The algorithm works well not only for temporal phrases in text posted on social networks but also for other types of phrases (such as location and objective ones), other areas (religion, politics, military, etc.), and other languages.
机译:在线发布的文本中与时间相关的信息是攻击者针对的一种个人信息,这是在线共享信息存在风险的原因之一。因此,在将时间信息发布到社交网络服务之前,应先对其进行匿名处理。信息匿名化的一种方法是用匿名短语替换敏感短语,但是攻击者通常会因其不自然而发现这种匿名化。另一种方法是检测文本中的时间段落,但是删除这些段落会使文本的含义不自然。我们已经开发了一种算法,该算法可用于通过删除时间段来匿名化与时间相关的个人信息,而这样做不会改变消息的自然含义。通过使用机器学习的模式来检测时间短语,该机器学习的模式由句子解析树的子树表示。通过使用集成到算法中的时间标记器,可以将解析树中的时间短语与树的其他部分区分开。在一个实验中,在社交网络上发布了4008个句子,其中84.53%的句子是匿名的,没有改变其预期的含义。这明显优于最佳的先前时间短语检测算法的72.88%的比率。在学习的模式中,最常见的前十种模式用于检测87.78%的时态短语。这意味着仅某些最常见的模式可用于匿名化大多数要发布在SNS上的消息中的时间短语。该算法不仅适用于社交网络上发布的文本中的时间短语,而且还适用于其他类型的短语(例如位置和目标短语),其他领域(宗教,政治,军事等)和其他语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号