首页> 外文会议>Information and communication technology >An Efficient Unsavory Data Detection Method for Internet Big Data
【24h】

An Efficient Unsavory Data Detection Method for Internet Big Data

机译:一种有效的互联网大数据不良数据检测方法

获取原文
获取原文并翻译 | 示例

摘要

With the explosion of information technologies, the volume and diversity of the data in the cyberspace are growing rapidly; meanwhile the unsavory data are harming the security of Internet. So how to detect the unsavory data from the Internet big data based on their inner semantic information is of growing importance. In this paper, we propose the i-Tree method, an intelligent semantics-based unsavory data detection method for internet big data. Firsdy, the internet big data are mapped into a high-dimensional feature space, representing as high-dimensional points in the feature space. Secondly, to solve the "curse of dimensionality" problem of the high-dimensional feature space, the principal component analysis (PCA) method is used to reduce the dimensionality of the feature space. Thirdly, in the new generated feature space, we cluster the data objects, transform the data clusters into regular unit hyper-cubes and create one-dimensional index for data objects based on the idea of multi-dimensional index. Finally, we realize the semantics-based data detection for a given unsavory data object according to similarity search algorithm and the experimental results proved our method can achieve much better efficiency.
机译:随着信息技术的迅猛发展,网络空间中数据的数量和多样性正在迅速增长。同时,不良数据正在危害互联网的安全。因此,如何基于Internet大数据的内部语义信息来检测不良数据已经变得越来越重要。在本文中,我们提出了i-Tree方法,这是一种基于语义的智能大数据检测方法,适用于互联网大数据。首先,互联网大数据被映射到一个高维特征空间,表示为特征空间中的高维点。其次,为了解决高维特征空间的“维数诅咒”问题,使用主成分分析(PCA)方法降低了特征空间的维数。第三,在新生成的特征空间中,我们将数据对象聚类,将数据聚类转换为规则的单位超立方体,并基于多维索引的思想为数据对象创建一维索引。最后,根据相似性搜索算法,对给定的非咸数据对象实现了基于语义的数据检测,实验结果证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号