首页> 外文期刊>British Journal of Applied Science and Technology >Experiments on the Use of Machine LearningClassification Methods in Online Crime TextFiltering and Classification
【24h】

Experiments on the Use of Machine LearningClassification Methods in Online Crime TextFiltering and Classification

机译:机器学习分类方法在网络犯罪文本过滤与分类中的实验

获取原文
       

摘要

With the exponential growth of textual information available from the Internet, there has been an emergent need to find relevant and in-time knowledge about crimes from this huge size of information. The huge size of such data makes the process of retrieving and analyzing texts manually a very difficult task. Furthermore, domain-specific documents classification is a hard task and suffers from low classification efficiency due to overlapping among domain subclasses. This work is focused on finding an appropriate classification model for crime domain-specific knowledge on the Web. To do that, the two-level classification method for online crime text filtering and classification is used. In each level, three feature selection methods (Gini Index, Chi-square statistic and Information gain) and three learning methods (K-nearest neighbor, Naive Bayes and support vector machine (SVM)) are investigated. The experimental results in the first level indicate that Information gain feature selection method performs the best for crime terms selection and both SVM and NB exhibit the best performance for crime text filtering. Furthermore, the experimental results in the second level indicate that Gini index feature selection method performs the best for crime types terms selection and SVM classifier exhibits the best performance on classifying crime documents into their appropriate crime types.
机译:随着互联网上文​​本信息的呈指数增长,迫切需要从如此庞大的信息中找到有关犯罪的实时信息。此类数据的巨大规模使手动检索和分析文本的过程变得非常困难。此外,特定领域的文档分类是一项艰巨的任务,并且由于领域子类之间的重叠而导致分类效率低下。这项工作的重点是为网络上针对犯罪领域的特定知识找到合适的分类模型。为此,使用了用于在线犯罪文本过滤和分类的两级分类方法。在每个级别中,研究了三种特征选择方法(基尼系数,卡方统计量和信息增益)和三种学习方法(K近邻,朴素贝叶斯和支持向量机(SVM))。在第一级的实验结果表明,信息获取特征选择方法在犯罪术语选择方面表现最佳,而SVM和NB在犯罪文本过滤方面均表现出最佳性能。此外,第二级的实验结果表明,基尼指标特征选择方法在犯罪类型术语选择方面表现最佳,而SVM分类器在将犯罪文档分类为适当犯罪类型方面表现出最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号