Identifying Spam Web Pages Based on Content Similarity

机译：基于内容相似度识别垃圾网页

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Web provides its users with abundant information. Unfortunately, when a Web search is performed, both users and search engines are faced with an annoying problem: the presence of misleading Web pages, i.e., spam Web pages, that are ranked among legitimate Web pages. The mixed results downgrade the performance of search engines and frustrate users who are required to filter out useless information. In order to improve the quality of Web searches, the number of spam pages on the Web must be reduced, if they cannot be eradicated entirely. In this paper, we present a novel approach for identifying spam Web pages that have mismatched titles and bodies and/or low percentage of hidden content. By considering the content of Web pages, we develop a spam-detection tool that is (ⅰ) reliable, since we can accurately detect 94% of spam/legitimate Web pages, and (ⅱ) computational inexpensive, since the word-correlation factors used for content analysis are precomputed. We have verified that our spam-detection approach outperforms existing anti-spam methods by an average of 10% in terms of F-measure.

机译：Web为用户提供了丰富的信息。不幸的是，当执行网络搜索时，用户和搜索引擎都面临一个烦人的问题：存在误导性网页，即垃圾邮件网页，这些网页被列在合法网页中。混合的结果降低了搜索引擎的性能，并使需要过滤掉无用信息的用户感到沮丧。为了提高Web搜索的质量，如果不能完全消除垃圾邮件，则必须减少Web上的垃圾邮件页面的数量。在本文中，我们提供了一种新颖的方法来识别标题和正文不匹配和/或隐藏内容所占百分比较低的垃圾邮件网页。通过考虑网页的内容，我们开发了一种（。）可靠的垃圾邮件检测工具，因为我们可以准确地检测出94％的垃圾邮件/合法网页，并且（ⅱ）计算便宜，因为使用了词相关因子用于内容分析的是预先计算的。我们已经证明，就F措施而言，我们的垃圾邮件检测方法比现有的反垃圾邮件方法平均要高出10％。

著录项

来源
《International Conference on Computational Science and Its Applications;ICCSA 2008》|2008年|P.204-219|共16页
会议地点
作者
Maria Soledad Pera; Yiu-Kai Ng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
入库时间 2022-08-26 14:16:03

相似文献

外文文献
中文文献
专利

1. A structural, content-similarity measure for detecting spam documents on the web [J] . Maria Soledad Pera, Yiu-Kai Ng International journal of web information systems . 2009,第4期

机译：一种用于检测网络上垃圾邮件文档的结构，内容相似性度量
2. POLARITYSPAM: PROPAGATING CONTENT-BASED INFORMATION THROUGH A WEB-GRAPH TO DETECT WEB-SPAM [J] . F. Javier Ortega, Jose A. Troyano, Fermin L. Cruz, International Journal of Innovative Computing Information and Control . 2012,第4期

机译：POLARITYSPAM：通过Web图像传播基于内容的信息以检测Web垃圾邮件
3. Feature Selection Model Based Content Analysis for Combating Web Spam [J] . Shipra Mittal, Akanksha Juneja Computer Science & Information Technology . 2016,第4期

机译：基于特征选择模型的反垃圾邮件内容分析
4. Identifying Spam Web Pages Based on Content Similarity [C] . Maria Soledad Pera, Yiu-Kai Ng International Conference on Computational Science and Its Applications . 2008

机译：根据内容相似识别垃圾邮件网页
5. Web based content and hybrid teaching: Student perceptions of the effectiveness of using web based content and hyper-linked teaching units in teaching hybrid business and marketing post secondary classes. [D] . Richardson, W. Tim G. 2007

机译：基于Web的内容和混合教学：学生对使用基于Web的内容和超链接教学单元在混合商务和市场营销中学后课程教学中的有效性的看法。
6. Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers [O] . Mansour Alsaleh, Abdulrahman Alarifi -1

机译：非英语内容的Web垃圾邮件分析：寻求更有效的基于语言的分类器
7. Identifying Spam Web Pages Based on Content Similarity [O] . Maria Soledad Pera, Yiu-kai Ng 2010

机译：基于内容相似度识别垃圾网页

Identifying Spam Web Pages Based on Content Similarity

摘要

著录项

相似文献

相关主题

期刊订阅