...
首页> 外文期刊>Expert Systems with Application >Towards filtering undesired short text messages using an online learning approach with semantic indexing
【24h】

Towards filtering undesired short text messages using an online learning approach with semantic indexing

机译:使用带有语义索引的在线学习方法来过滤不想要的短信

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The popularity and reach of short text messages commonly used in electronic communication have led spammers to use them to propagate undesired content. This is often composed by misleading information, advertisements, viruses, and malwares that can be harmful and annoying to users. The dynamic nature of spam messages demands for knowledge-based systems with online learning and, therefore, the most traditional text categorization techniques can not be used. In this study, we introduce the MDLText, a text classifier based on the minimum description length principle, to the context of filtering undesired short text messages. The proposed approach supports incremental learning and, therefore, its predictive model is scalable and can adapt to continuously evolving spamming techniques. It is also fast, with computational cost increasing linearly with the number of samples and features, which is very desirable for expert systems applied to real-time electronic communication. In addition to the dynamic nature of these messages, they are also short and usually poorly written, rife with slangs, symbols, and abbreviations that difficult text representation, learning, and filtering. In this scenario, we also investigated the benefits of using text normalization and semantic indexing techniques. We showed these techniques can improve the text content quality and, consequently, enhance the performance of the expert systems for spamming detection. Based on these findings, we propose a new hybrid ensemble approach that combines the predictions obtained by the classifiers using the original text samples along with their variations created by applying text normalization and semantic indexing techniques. It has the advantages of being independent of the classification method and the results indicated it is efficient to filter undesired short text messages. (C) 2017 Elsevier Ltd. All rights reserved.
机译:电子通讯中通常使用的短文本消息的普及和范围已使垃圾邮件发送者可以使用它们来传播不需要的内容。这通常由误导性信息,广告,病毒和恶意软件组成,这些信息,广告,病毒和恶意软件可能对用户有害并令人讨厌。垃圾邮件的动态性质要求具有在线学习的基于知识的系统,因此,不能使用最传统的文本分类技术。在这项研究中,我们将MDLText(一种基于最小描述长度原则的文本分类器)引入到过滤不需要的短文本消息的上下文中。所提出的方法支持增量学习,因此,其预测模型是可扩展的,并且可以适应不断发展的垃圾邮件发送技术。它的速度也很快,其计算成本随样本和特征的数量线性增加,这对于应用于实时电子通信的专家系统是非常理想的。除了这些消息的动态性质外,它们还简短且通常写得很差,并充斥着s语,符号以及难以进行文本表示,学习和过滤的缩写。在这种情况下,我们还研究了使用文本规范化和语义索引技术的好处。我们证明了这些技术可以提高文本内容的质量,从而提高垃圾邮件检测专家系统的性能。基于这些发现,我们提出了一种新的混合集成方法,该方法将分类器使用原始文本样本获得的预测与通过应用文本规范化和语义索引技术创建的变体相结合。它具有不依赖于分类方法的优点,并且结果表明,它可以有效过滤不需要的短文本消息。 (C)2017 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号