Towards filtering undesired short text messages using an online learning approach with semantic indexing

Silva Renato M.; Alberto Tulio C.; Almeida Tiago A.; Yamakami Akebo

首页> 外文期刊>Expert Systems with Application >Towards filtering undesired short text messages using an online learning approach with semantic indexing

【24h】

Towards filtering undesired short text messages using an online learning approach with semantic indexing

机译：使用带有语义索引的在线学习方法来过滤不想要的短信

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The popularity and reach of short text messages commonly used in electronic communication have led spammers to use them to propagate undesired content. This is often composed by misleading information, advertisements, viruses, and malwares that can be harmful and annoying to users. The dynamic nature of spam messages demands for knowledge-based systems with online learning and, therefore, the most traditional text categorization techniques can not be used. In this study, we introduce the MDLText, a text classifier based on the minimum description length principle, to the context of filtering undesired short text messages. The proposed approach supports incremental learning and, therefore, its predictive model is scalable and can adapt to continuously evolving spamming techniques. It is also fast, with computational cost increasing linearly with the number of samples and features, which is very desirable for expert systems applied to real-time electronic communication. In addition to the dynamic nature of these messages, they are also short and usually poorly written, rife with slangs, symbols, and abbreviations that difficult text representation, learning, and filtering. In this scenario, we also investigated the benefits of using text normalization and semantic indexing techniques. We showed these techniques can improve the text content quality and, consequently, enhance the performance of the expert systems for spamming detection. Based on these findings, we propose a new hybrid ensemble approach that combines the predictions obtained by the classifiers using the original text samples along with their variations created by applying text normalization and semantic indexing techniques. It has the advantages of being independent of the classification method and the results indicated it is efficient to filter undesired short text messages. (C) 2017 Elsevier Ltd. All rights reserved.

机译：电子通讯中通常使用的短文本消息的普及和范围已使垃圾邮件发送者可以使用它们来传播不需要的内容。这通常由误导性信息，广告，病毒和恶意软件组成，这些信息，广告，病毒和恶意软件可能对用户有害并令人讨厌。垃圾邮件的动态性质要求具有在线学习的基于知识的系统，因此，不能使用最传统的文本分类技术。在这项研究中，我们将MDLText（一种基于最小描述长度原则的文本分类器）引入到过滤不需要的短文本消息的上下文中。所提出的方法支持增量学习，因此，其预测模型是可扩展的，并且可以适应不断发展的垃圾邮件发送技术。它的速度也很快，其计算成本随样本和特征的数量线性增加，这对于应用于实时电子通信的专家系统是非常理想的。除了这些消息的动态性质外，它们还简短且通常写得很差，并充斥着s语，符号以及难以进行文本表示，学习和过滤的缩写。在这种情况下，我们还研究了使用文本规范化和语义索引技术的好处。我们证明了这些技术可以提高文本内容的质量，从而提高垃圾邮件检测专家系统的性能。基于这些发现，我们提出了一种新的混合集成方法，该方法将分类器使用原始文本样本获得的预测与通过应用文本规范化和语义索引技术创建的变体相结合。它具有不依赖于分类方法的优点，并且结果表明，它可以有效过滤不需要的短文本消息。（C）2017 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2017年第10期|314-325|共12页
作者
Silva Renato M.; Alberto Tulio C.; Almeida Tiago A.; Yamakami Akebo;
展开▼
作者单位

Univ Campinas UNICAMP, Dept Syst & Energy, Sao Paulo, Brazil;

Fed Univ Sao Carlos UFSCar, Dept Comp Sci, Sao Paulo, Brazil;

Fed Univ Sao Carlos UFSCar, Dept Comp Sci, Sao Paulo, Brazil;

Univ Campinas UNICAMP, Dept Syst & Energy, Sao Paulo, Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Minimum description length; Short text messages; Semantic indexing; Text categorization; Machine learning;

机译：最小描述长度;短文本消息;语义索引;文本分类;机器学习;

相似文献

外文文献
中文文献
专利

1. Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering [J] . Almeida Tiago A., Silva Tiago P., Santos Igor, Knowledge-Based Systems . 2016,第sepa15期

机译：文本规范化和语义索引编制，以增强即时消息和SMS垃圾邮件过滤功能
2. TRANSDUCTIVE LEARNING FOR SHORT-TEXT CLASSIFICATION PROBLEMS USING LATENT SEMANTIC INDEXING [J] . SARAH ZELIKOVITZ, FINELLA MARQUEZ International Journal of Pattern Recognition and Artificial Intelligence . 2005,第2期

机译：基于潜在语义索引的短文本分类问题的翻译学习
3. Advanced Machine Learning Approach to Handle Filtering Unwanted Messages in Online Social Networks [J] . S. Venkana Babu, N. K. Kameswarao International Organization of Scientific Research . 2019,第9期

机译：在线社交网络中处理过滤不需要的消息的先进机器学习方法
4. Semantic Indexing-Based Data Augmentation for Filtering Undesired Short Text Messages [C] . Johannes V. Lochter, Renato M. Silva, Tiago A. Almeida, IEEE International Conference on Machine Learning and Applications . 2018

机译：基于语义索引的数据扩充，用于过滤不希望的短文本消息
5. The Impact of Short Message Service Text Learning Support on Online Course Completion and Student Satisfaction. [D] . Boone, Joyce B. 2016

机译：短信服务文本学习支持对在线课程结业和学生满意度的影响。
6. Journal Descriptor Indexing Tool for Categorizing Text According to Discipline or Semantic Type [O] . Susanne M. Humphrey, Chris J. Lu, Willie J. Rogers, 2006

机译：用于根据学科或语义类型对文本进行分类的日记描述符索引工具
7. Memetic algorithm for short messaging service spam filter using text normalization and semantic approach [O] . Arnold Adimabua Ojugo, Andrew Okonji Eboka 2020

机译：使用文本归一化和语义方法的短消息传递服务垃圾邮件滤波器的膜算法

Towards filtering undesired short text messages using an online learning approach with semantic indexing

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅