首页> 中文期刊> 《计算机应用与软件》 >不均衡训练集下短信过滤系统 kNN方法的研究

不均衡训练集下短信过滤系统 kNN方法的研究

         

摘要

The overrunning of the unwanted short messages seriously impacts the social ethos and disrupts the normal life order of people .It has considerable practical value to research and develop the filtering technology of harmful short messages .In this paper, ICTCLAS segmentation system developed by the Institute of Computing Technology of CAS is applied to realise the transition of short message text to the eigenvectors in combination with keywords extraction using TFIDF word right metrics , then the kNN method is adopted to realise the discriminant of short messagescategories, thus the filtration of bad short messages is realised .In addition, according to the unbalanced distribution of training set, we apply the density-based improved method to solve the case of original classification results which are prone to the categories of big sample quite efficiently.Experiments show that the accuracy rate of the improved method reaches about 79.18%, a 1.23% increase compared with the originalmethod.This method is able to more effectively filter the unwanted short messages , and has certain practical value .%不良短信的泛滥,严重影响了社会风气,干扰了人们正常的生活秩序,研发不良短信过滤技术具有相当的实用价值。应用中科院计算所研制开发的ICTCLAS分词系统,结合TFIDF词权度量指标提取关键词,实现短信文本到特征向量的转换,然后采用kNN方法实现短信的类别判断,从而实现不良短信的过滤。另外,针对训练集分布不均衡的情况,应用基于密度的改进方法,较为有效地处理了原来分类结果倾向于大类别样本的情况。实验表明,改进后的方法的准确率约79.18%,比原方法提升了约1.23%。该方法能够比较有效地过滤不良短信,具有一定的实用价值。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号