Towards privacy preserving unstructured big data publishing

Mehta Brijesh; Rao Udai Pratap; Gupta Ruchika; Conti Mauro

首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >Towards privacy preserving unstructured big data publishing

【24h】

Towards privacy preserving unstructured big data publishing

机译：朝着隐私保留非结构化大数据出版

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Various sources and sophisticated tools are used to gather and process the comparatively large volume of data or big data that sometimes leads to privacy disclosure (at broader or finer level) for the data owner. Privacy preserving data publishing approaches such as k-anonymity, l-diversity, and t-closeness are very well used to de-identify data, however, chances of re-identification of attributes always exist as data is collected from multiple sources such as public web, social media, Internet whereabouts, and sensors that are highly prone to data linkages. In literature, k-anonymity stands out amongst the most popular mainstream data anonymization approaches that can also be used for large sized data. However, applying k-anonymization for variety of data (especially unstructured data) is difficult in the traditional way, due to the fact that it requires the given data to be classified into the personal data, the quasi identifiers, and the sensitive data. We identify existing approaches from the literature of Natural Language Processing(NLP) to convert the unstructured data to structured form in order to apply k-anonymization over the generated structured records. We adopt a two phase Conditional Random Field (CRF) based Named Entity Recognition (NER) approach to represent unstructured data into the structured form. Further, we propose an Improved Scalable k-Anonymization (ImSKA) to anonymize the well represented unstructured data that achieves privacy preserving unstructured big data publishing. We compare both of the propose approaches namely NER and ImSKA with existing approaches and the results show that our proposed solutions outperform the existing approaches in terms of F1 score and Normalized Cardinality Penalty (NCP), respectively. Since, NER approaches are widely used for bio-medical datasets, we have also used a well-known Bio-NER dataset called GENIA corpus for measuring the performance.

机译：各种来源和复杂的工具用于收集和处理相对大量的数据或大数据，有时会导致数据所有者的隐私披露（在更广泛或更精细的级别）。隐私保留数据发布等方法，如k-匿名，l-多样性和t闭合非常好地用于去识别数据，但是，重新识别属性的机会始终存在，因为数据从公众诸如数据收集数据网络，社交媒体，互联网下落，以及高度容易出现数据联系的传感器。在文献中，K-Anymony突出了最受欢迎的主流数据匿名方法，该方法也可以用于大型数据。然而，由于它要求给定数据被分类为个人数据，准标识符和敏感数据，因此难以以传统方式应用于各种数据（特别是非结构化数据）的k-anymation我们识别来自自然语言处理的文献（NLP）的现有方法，以将非结构化数据转换为结构形式，以便在生成的结构化记录上应用k-匿名化。我们采用基于两个相位条件随机字段（CRF）命名实体识别（NER）方法来表示非结构化数据进入结构形式。此外，我们提出了一种改进的可扩展k - 匿名化（IMSKA），以匿名化良好代表的非结构化数据，该数据实现了保留了非结构化大数据发布的隐私。我们比较所有提议的方法都具有现有方法，结果表明，我们提出的解决方案分别以F1分数和规范化的基数惩罚（NCP）在现有方法方面。由于NER方法广泛用于生物医疗数据集，因此我们还使用了一个称为Genia Corpus的知名生物网数据集来测量性能。

著录项

来源
《Journal of intelligent & fuzzy systems: Applications in Engineering and Technology》 |2019年第4期|共12页
作者
Mehta Brijesh; Rao Udai Pratap; Gupta Ruchika; Conti Mauro;
展开▼
作者单位

Maharana Pratap Univ Agr &

Technol Coll Technol &

Engn Dept Comp Sci &

Engn Udaipur Rajasthan India;

Sardar Vallabhbhai Natl Inst Technol Comp Engn Dept Surat Gujarat India;

Chandigarh Univ Comp Sci &

Engn Dept Mohali Punjab India;

Univ Padua Dept Math Padua Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统;
关键词
Privacy preserving big data publishing; unstructured data privacy; named entity recognition; k-anonymity; scalable k-anonymization;

机译：隐私保留大数据出版;非结构化数据隐私;命名实体识别;k-匿名;可扩展的k-匿名化;

相似文献

外文文献
中文文献
专利

1. Towards privacy preserving unstructured big data publishing [J] . Mehta Brijesh, Rao Udai Pratap, Gupta Ruchika, Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2019,第4期

机译：朝着隐私保留非结构化大数据出版
2. Privacy-preserving governmental data publishing: A fog-computing-based differential privacy approach [J] . Piao Chunhui, Shi Yajuan, Yan Jiaqi, Future generation computer systems . 2019,第JANa期

机译：维护隐私的政府数据发布：基于雾计算的差分隐私方法
3. Privacy, Space and Time: a Survey on Privacy-Preserving Continuous Data Publishing [J] . Manos Katsomallos, Katerina Tzompanaki, Dimitris Kotzinos Journal of Spatial Information Science . 2019,第19期

机译：隐私，空间和时间：隐私持续数据发布的调查
4. A Privacy Preserving Data Publishing Middleware for Unstructured, Textual Social Media Data [C] . Prasadi Abeywardana, Uthayasanker Thayasivam International Workshop on Social Threats in Online Conversations: Understanding and Management . 2020

机译：用于非结构化文本社交媒体数据的隐私保护数据发布中间件
5. A Generic Privacy Quantification Framework for Privacy-Preserving Data Publishing. [D] . Zhu, Zutao. 2010

机译：用于保护隐私的数据发布的通用隐私量化框架。
6. Privacy preserving data publishing of categorical data through k-anonymity and feature selection [O] . Aristos Aristodimou, Athos Antoniades, Constantinos S. Pattichis 2016

机译：通过k-匿名性和特征选择来保护分类数据的隐私保护数据发布
7. Cooperative privacy game: a novel strategy for preserving privacy in data publishing [O] . 2016

机译：合作隐私游戏：一种在数据发布中保护隐私的新颖策略

Towards privacy preserving unstructured big data publishing

摘要

著录项

相似文献

相关主题

期刊订阅