【24h】

Splog Filtering Based on Writing Consistency

机译:基于写作一致性的拆除过滤

获取原文

摘要

Splog is the key challenge in the access of blogosphere. Existing splog-filtering methods are restricted to the way for traditional web spam filtering, without considering the characteristics of blogs. Inspired by the observation that fake writers (writers of splogs) have striking higher consistent writing behavior than real writers (writers of legitimate blogs), we propose to detect splogs by distinguishing fake writers from real writers. To measure how consistent the writing behavior is, we propose the consistency-based features derived from writing interval, writing structure and writing topic. Then we designed a splog-filtering system which can use the consistency-based features effectively and flexibly. The experimental results on Blog06 data set show that, proposed measure can effectively detect splogs, reaching an accuracy of 90%. Compared with content-based methods, our approach can get a comparable accuracy with fewer features and smaller train set, indicating that writing consistency represents the essential difference between splogs and blogs.
机译:Splog是博客光圈访问中的关键挑战。在不考虑博客的特征的情况下,现有的拆分滤波方法仅限于传统网络垃圾邮件过滤的方式。灵感来自观察,假作者(拆分的作家)醒目比真正的作家(合法博客的作者)醒目的一致写作行为,我们建议通过区分自真正作家的虚假作家来检测脱落。要测量写入行为的一致性,我们提出了从写入间隔,写作结构和写入主题的基于一致性的特征。然后我们设计了一种拆分系统,可以有效和灵活地使用基于一致性的特征。 BloG06数据集的实验结果表明,所提出的措施可以有效地检测拆分,达到90%的准确性。与基于内容的方法相比,我们的方法可以获得具有较少特征和更小的列车集的可比准确性,表明写作一致性代表了捕获和博客之间的基本区别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号