Autoencoder-based information content preserving data anonymization method and system

摘要

A method of providing an auto-encoder for anonymizing data associated with a population of entities is disclosed. The method includes providing a computer system with a memory storing specific computer-executable instructions for a neural network. The neural network includes an input layer of nodes; three or more layers of nodes; and an output layer of nodes to provide an encoded output vector. The second layer of nodes has more nodes than the first and third layers of nodes. The method also includes identifying a plurality of characteristics associated with the entities and preparing a plurality of input vectors that include a characteristic. The characteristics appear in the input vector as transformed numeric information from human recognizable text. The method includes training the neural network during a plurality of training cycles comprising: processing an input vector with the neural network to provide an encoded output vector; determining an output vector reconstruction error by calculating a function of the encoded output vector and the input vector; back-propagating the output vector reconstruction error back through the neural network; and recalibrating a weight to minimize the output vector reconstruction error. Additional neural networks are also disclosed. The outputs of the additional neural networks may be combined. Encoded output vectors may be compared to identify a common characteristic between two or more entities or to identify two or more entities with the common characteristic. An auto-encoder system for anonymizing data is also disclosed.

机译：公开了提供自动编码器，用于与匿名实体的群相关的数据的方法。该方法包括提供一计算机系统，该存储器存储用于神经网络的特定的计算机可执行指令。该神经网络包括节点的输入层;三个或节点的多个层;和节点的输出层，以提供经编码的输出向量。节点的第二层具有比节点的第一和第三层多个节点。该方法还包括识别多个与所述实体相关联的特性和制备多个，包括一个特征输入向量。特性出现在输入向量为从人类可识别文本转化数字信息。该方法包括多个包含训练周期期间训练神经网络：处理与所述神经网络，以提供经编码的输出向量的输入向量;通过计算经编码的输出向量与输入矢量的函数来确定输出向量重构误差;向后传播通过神经网络的输出向量重构误差背面;和重新校准的重量，以最小化输出向量重构误差。另外的神经网络也被公开。附加神经网络的输出可以被组合。编码输出矢量可以与两个或更多个实体之间识别一个共同的特点或以识别与共同特征的两个或更多个实体。对于匿名数据的自动编码器系统也被公开。

著录项

公开/公告号US11227067B2

专利类型
公开/公告日2022-01-18

原文格式PDF
申请/专利权人 LUCINITY EHF;
展开▼

申请/专利号US202017020453
发明设计人 JUSTIN BERCICH;THERESA BERCICH;GUDMUNDUR RUNAR KRISTJANSSON;ANUSH VASUDEVAN;
展开▼

申请日2020-09-14
分类号G06F21/62;G06N3/04;G06N3/08;
国家 US
入库时间 2022-08-24 23:23:12

Autoencoder-based information content preserving data anonymization method and system

摘要

著录项

相似文献