首页> 外文期刊>Mobile information systems >Data Mining Technology Application in False Text Information Recognition
【24h】

Data Mining Technology Application in False Text Information Recognition

机译:数据挖掘技术应用于虚假文本信息识别

获取原文
           

摘要

False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior.
机译:关于互联网的虚假信息被视为对我们社会的严重社会危害。为了识别错误的文本信息,在本文中,在虚假药广告领域提出了一种有效的采矿文本特征方法。首先,从官方网站中收集了虚假药广告和真实药物广告的数据,以建立虚假和真实的药物广告数据库。其次,通过对药物广告的文本执行特征提取,这项工作基于有效特征,并根据是否是假医学广告,基于有效特征,并将正面或负标签分配给矩阵的特征向量。第三,这项研究训练并测试了几种不同的分类器,选择了识别虚假广告的最佳性能的分类模型,并发现可以确定分类的关键特性。最后,使用最佳性能的模型用于预测从新浪微博收集的新虚假广告。在识别虚假药物广告的情况下,在特征选择之后在特征集上建立的支持向量机(SVM)分类器的分类效果是最有效的。本研究的调查结果可以为政府提供有效的方法来识别和打击虚假广告。本研究具有一定的参考意义,在说明使用文本数据挖掘技术来识别和检测信息欺诈行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号