A Splog Filtering Method Based on String Copy Detection

机译：一种基于串复印检测的捕获过滤方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently many people publicize their blogs and the blogosphere becomes an important information source. It is used for various purposes such as analyzing trends and reputations, marketing, etc. One problem of blogosphere is spam like e-mails and web links. There are many spam blogs (splogs) that are generated to make users to access specific sites. This paper proposes a splog filtering method. Splog is usually generated automatically by copying words and phrases from other documents. Therefore, the proposed method detects strings appearing in multiple blogs and uses a copy rate of strings as a key feature for splog filtering. To evaluate the proposed method, we constructed an evaluation corpus by gathering blogs randomly during a certain period of time and manually judged whether each blog is splog or not. The experiment using this corpus reveals several features of splog filtering by copy string detection. The proposed method uses the suffix array for copied substring detection and it can judge each blog with time complexity of O(m{sup}2 log n) where n and m denote total length of documents used for copy detection and the lengths of the blog to be judged, respectively.

机译：最近许多人宣传他们的博客，博客圈成为一个重要的信息来源。它用于各种目的，例如分析趋势和声誉，营销等。博罗圈的一个问题是电子邮件和Web链接等垃圾邮件。生成许多垃圾邮件（拆分）以使用户访问特定站点。本文提出了一种捕获滤波方法。拼接通常通过复制来自其他文档的单词和短语自动生成。因此，所提出的方法检测到多个博客中出现的字符串，并使用字符串的副本速率作为捕获过滤的关键特征。为了评估所提出的方法，我们通过在一段时间内随机收集博客来构建评估语料库，并在一段时间内随机收集博客，并手动判断每个博客是否是捕果。使用此语料库的实验揭示了通过复制字符串检测删除捕获过滤的几个特征。该方法使用后缀阵列进行复制的子字符串检测，并且它可以判断每个博客的时间复杂度（m {sup} 2 log n），其中n和m表示用于复制检测的文档的总长度和博客的长度分别判断。

著录项

来源
《International Conference on Applications of Digital Information and Web Technologies》|2008年||共6页
会议地点
作者
Takaharu Takeda; Atsuhiro Takasu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词

相似文献

外文文献
中文文献
专利

1. An Image-Based Approach to Video Copy Detection With Spatio-Temporal Post-Filtering [J] . Multimedia, IEEE Transactions on . 2010,第4期

机译：基于图像的时空后滤波视频复制检测方法
2. An automatic filtering convergence method for iterative impulse noise filters based on PSNR checking and filtered pixels detection [J] . Chen Chao-Yu, Chen Chin-Hsing, Chen Chao-Ho, Expert Systems with Application . 2016,第nova期

机译：基于PSNR检查和滤波像素检测的迭代脉冲噪声滤波器自动滤波收敛方法
3. Filter-based hybridization capture of subgenomes enables resequencing and copy-number detection [J] . Daniel S Herman, G Kees Hovingh, Oleg Iartchouk, Nature methods . 2009,第7期

机译：基于过滤器的亚基因组杂交捕获可实现重测序和拷贝数检测
4. A Splog Filtering Method Based on String Copy Detection [C] . Takaharu Takeda, Atsuhiro Takasu International Conference on Applications of Digital Information and Web Technologies . 2008

机译：一种基于串复印检测的捕获过滤方法
5. Model-based seizure detection method using statistically optimal null filters. [D] . Shi, Liying. 2005

机译：基于模型的癫痫发作检测方法，使用统计上最佳的空滤波器。
6. Motion artifact detection and correction in functional near-infrared spectroscopy: a new hybrid method based on spline interpolation method and Savitzky–Golay filtering [O] . Sahar Jahani, Seyed K. Setarehdan, David A. Boas, 2018

机译：功能近红外光谱中的运动伪影检测和校正：基于样条插值方法和Savitzky-Golay滤波的新混合方法
7. Splog Filtering based on Writing Consistency [O] . Wei Liu, Songbo Tan, Hongbo Xu, 2008

机译：基于写作一致性的splog过滤

A Splog Filtering Method Based on String Copy Detection

摘要

著录项

相似文献

相关主题

期刊订阅