首页> 外文会议>International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery >A Binary Feature Extraction Based Data Provenance System Implemented on Flink Platform
【24h】

A Binary Feature Extraction Based Data Provenance System Implemented on Flink Platform

机译:基于二进制特征提取在Flink平台上实现的数据出处系统

获取原文

摘要

Data protection and the control of information flow are basic requirements for the security operation of enterprises or organizations. The data provenance of documents is a function that records the transmission of a specific document and provenance afterwards. As an important function of enterprise information security control, it has been confronted with the trouble of high management costs. Therefore, this paper attempts to recover the document content by proactively monitoring the internal traffic data of the enterprise and restore the document and find the parent document accurately through the proposed algorithm, thereby getting rid of the shackle of traditional document tracing. In order to ensure the flexibility and scalability of the streaming data restoration, this paper tries to build algorithm modules based on Flink, a streaming process platform, by migrating key computing services to its platform. In the process, the capture agent is set at the key node to collect traffic data, which is put into the stream processing system through the message queue. The stream processing system restores the file using document restoration algorithm, and finally the file is handed over to the feature extraction module. After the feature extraction module completes the file analysis, it is stored on file systems or structed data storage systems and waits for document tracking requests. The entire system solution achieved above and the daily business of the enterprise are completely seperated, while the load on the internal network flow is also very small. On the other hand, relying on the advantages of Flink's excellent distributed features, the experiments show that the data provenance results are satisfactory.
机译:数据保护和信息流的控制是企业或组织安全运营的基本要求。文档的数据出处是一种函数,记录特定文件的传输并之后的出处。作为企业信息安全控制的重要功能,它已面临高管理费用的麻烦。因此,本文试图通过主动监控企业的内部流量数据并通过所提出的算法准确地查找文档并找到父文档的恢复文档内容,从而摆脱传统文档追踪的钩形。为了确保流数据恢复的灵活性和可扩展性,本文试图通过将关键计算服务迁移到其平台,基于Flink,流过程平台构建算法模块。在该过程中,捕获代理被设置为密钥节点以通过消息队列将流量数据收集到流处理系统。流处理系统使用文档恢复算法恢复文件,最后将文件交给特征提取模块。在特征提取模块完成文件分析之后,它存储在文件系统或结构化数据存储系统上,并等待文档跟踪请求。完全分开了上面实现的整个系统解决方案和企业的日常业务,而内部网络流量的负载也非常小。另一方面,依靠Flink优异的分布特征的优点,实验表明,数据出处结果是令人满意的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号