Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches

Sarah Jane Delany; Derek Bridge

首页> 外文期刊>Artificial Intelligence Review: An International Science and Engineering Journal >Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches

【24h】

Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches

机译：基于文本案例的垃圾邮件过滤推理：基于特征的方法和无特征方法的比较

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Spam filtering is a text classification task to which Case-Based Reasoning (CBR) has been successfully applied. We describe the ECUE system, which classifies emails using a feature-based form of textual CBR. Then, we describe an alternative way to compute the distances between cases in a feature-free fashion, using a distance measure based on text compression. This distance measure has the advantages of having no set-up costs and being resilient to concept drift. We report an empirical comparison, which shows the feature-free approach to be more accurate than the feature-based system. These results are fairly robust over different compression algorithms in that we find that the accuracy when using a Lempel-Ziv compressor (GZip) is approximately the same as when using a statistical compressor (PPM). We note, however, that the feature-free systems take much longer to classify emails than the feature-based system. Improvements in the classification time of both kinds of systems can be obtained by applying case base editing algorithms, which aim to remove noisy and redundant cases from a case base while maintaining, or even improving, generalisation accuracy. We report empirical results using the Competence-Based Editing (CBE) technique. We show that CBE removes more cases when we use the distance measure based on text compression (without significant changes in generalisation accuracy) than it does when we use the feature-based approach.

机译：垃圾邮件过滤是一种文本分类任务，已成功应用基于案例的推理（CBR）。我们描述了ECUE系统，该系统使用基于功能的形式的文本CBR对电子邮件进行分类。然后，我们描述一种替代方法，该方法使用基于文本压缩的距离度量以无特征的方式计算案例之间的距离。这种距离测量的优点是无需设置成本，并且可以抵抗概念漂移。我们报告了一个经验比较，它显示了无特征方法比基于特征的系统更加准确。这些结果在不同的压缩算法上都非常可靠，因为我们发现使用Lempel-Ziv压缩器（GZip）时的准确性与使用统计压缩器（PPM）时的准确性大致相同。但是，我们注意到，与基于功能的系统相比，无功能的系统对电子邮件进行分类需要更长的时间。可以通过应用案例库编辑算法来提高两种系统的分类时间，该算法旨在在保持甚至提高泛化准确性的同时，从案例库中删除嘈杂和多余的案例。我们使用基于能力的编辑（CBE）技术报告经验结果。我们证明，与基于特征的方法相比，使用基于文本压缩的距离量度（概化精度无明显变化）时，CBE可以消除更多的情况。

著录项

来源
《Artificial Intelligence Review: An International Science and Engineering Journal》 |2006年第2期|共13页
作者
Sarah Jane Delany; Derek Bridge;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
Spam filtering; Case-based reasoning; Case-base editing; Case-based maintenance; Feature selection; Distance measures; Text compression;

机译：垃圾邮件过滤;基于案例的推理;基于案例的编辑;基于案例的维护;功能选择;距离度量;文本压缩;

相似文献

外文文献
中文文献
专利

1. Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches [J] . Sarah Jane Delany, Derek Bridge Artificial Intelligence Review: An International Science and Engineering Journal . 2006,第1a2期

机译：基于文本案例的垃圾邮件过滤推理：基于特征的方法和无特征方法的比较
2. An Assessment of Case-Based Reasoning for Spam Filtering [J] . SARAH JANE DELANY, PADRAIG CUNNINGHAM, LORCAN COYLE Artificial Intelligence Review: An International Science and Engineering Journal . 2005,第3a4期

机译：基于案例的垃圾邮件过滤推理评估
3. Automated assessment of myocardial SPECT perfusion scintigraphy: A comparison of different approaches of case-based reasoning [J] . Aliasghar Khorsand, Senta Graf, Heinz Sochor, Artificial intelligence in medicine . 2007,第2期

机译：心肌SPECT灌注闪烁显像的自动评估：基于案例的推理的不同方法的比较
4. Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering [C] . Sarah Jane Delany, Derek Bridge International Conference on Case-Based Reasoning(ICCBR 2007); 20070813-16; Belfast(GB) . 2007

机译：赶上漂移：使用基于无特征的案例推理进行垃圾邮件过滤
5. A comparison of the rule and case-based reasoning approaches for the automation of help-desk operations at the tier-two level. [D] . Bryant, Michael Forrester. 2009

机译：比较规则和基于案例的推理方法以在第二层上实现帮助台操作的自动化。
6. Machine learning for email spam filtering: review approaches and open research problems [O] . Emmanuel Gbenga Dada, Joseph Stephen Bassi, Haruna Chiroma, 2019

机译：用于电子邮件垃圾邮件过滤的机器学习：评论方法和公开研究问题
7. Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches [O] . Sarah Jane Delany, Derek Bridge 2006

机译：垃圾邮件过滤的基于文本案例的推理：基于功能和无功能的方法的比较
8. Conversation for Textual Case-Based Reasoning. [R] . Gupta, K. M., Aha, D. W. 2007

机译：基于案例推理的语篇对话。

Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches

摘要

著录项

相似文献

相关主题

期刊订阅