首页> 外文期刊>Future generation computer systems >Generation of overlapping clusters constructing suitable graph for crime report analysis
【24h】

Generation of overlapping clusters constructing suitable graph for crime report analysis

机译:构建犯罪报告分析的合适图的重叠群集

获取原文
获取原文并翻译 | 示例
       

摘要

Cybercrime is a kind of criminal activity generally committed by cybercriminals or hackers. Crime activities are growing explosively all over the world which motivates the law enforcement agencies for systematic analysis of crimes. In many cases, crime information is stored as online text reports in an unstructured way and one report describes several different criminal activities. Analysis of these crime reports for identifying patterns and trends in crime and devising solutions to crime detection and prevention strategies are very challenging tasks. In this paper, the crime reports are preprocessed and relations among named entity pairs are extracted to give the structured form to the reports. Each extracted relation is converted to an n-dimensional real-valued vector based on the concept of Word2Vec model of Natural Language Processing. Then a novel agglomerative graph partitioning algorithm using various graph centrality measures is applied to partition the extracted relations. All the extracted relations of a report which are in a single partition are replaced by the representative of that partition and thus each report is described by a set of distinct types of relations. Next, a graph for the set of reports is constructed in such a way that nodes are corresponding to the tuple of relations that describes the reports, and an edge between a pair of nodes is drawn only if the corresponding pair of relations are of a similar type of two different reports. The constructed graph is a disconnected graph with each connected component is a clique. These cliques are easily identified in linear time of the number of edges in the graph and each clique provides a cluster of reports. As each report is described by a set of relations of different types, so obtained clusters are overlapping clusters. The degree of membership of a report in a cluster is also identified in the paper. The proposed method is experimented, and compared with some state-of-the-art partition-based and overlapping clustering algorithms to demonstrate its effectiveness in the domain of crime corpora.
机译:网络犯罪是一种通常由网络犯罪分子或黑客犯下的犯罪活动。犯罪活动在全世界爆炸性地发展,激励执法机构进行系统分析犯罪。在许多情况下,犯罪信息以非结构化方式作为在线文本报告,一份报告描述了几个不同的犯罪活动。对鉴定犯罪模式和趋势的犯罪报告分析以及设计犯罪检测和预防策略的解决方案是非常具有挑战性的任务。在本文中,犯罪报告是预处理的,并提取命名实体对之间的关​​系,以向报告提供结构化表格。基于自然语言处理的Word2VEC模型的概念,将每个提取的关系转换为N维实值矢量。然后应用了使用各种图形中心度测量的新型凝聚图分区算法来分区提取的关系。在单个分区中的报告的所有提取关系被该分区的代表替换,因此每次报告都由一组不同类型的关系描述。接下来,以这样的方式构造用于该报告集的图形,即节点对应于描述报告的关系的元组,并且仅当相应的关系对相似的关系时才能绘制一对节点之间的边缘两种不同的报告类型。构造的图形是与每个连接组件的断开的图形是一个集团。这些派系在图中的边缘数的线性时间中容易识别,并且每个Clique提供一组报告。由于每个报告都是由一组不同类型的关系描述的,因此获得的群集是重叠的群集。在纸上还确定了集群中报告的成员资格。所提出的方法是实验的,并与一些最先进的分区和重叠聚类算法进行比较,以证明其在犯罪语料域中的有效性。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号