【24h】

Web spam challenge proposal for filtering in archives

机译:针对垃圾邮件进行过滤的网络垃圾邮件挑战建议

获取原文

摘要

In this paper we propose new tasks for a possible future Web Spam Challenge motivated by the needs of the archival community. The Web archival community consists of several relatively small institutions that operate independently and possibly over different top level domains (TLDs). Each of them may have a large set of historic crawls. Efficient filtering would hence require (1) enhanced use of the time series of domain snapshots and (2) collaboration by transferring models across different TLDs. Corresponding Challenge tasks could hence include the distribution of crawl snapshot data for feature generation as well as classification of unlabeled new crawls of the same or even different TLDs.
机译:在本文中,我们根据档案社区的需求为可能的未来Web垃圾邮件挑战提出了新的任务。 Web归档社区由几个相对较小的机构组成,这些机构独立运作,并可能在不同的顶级域(TLD)上运作。他们每个人可能都有大量的历史爬网。因此,有效的过滤将要求(1)增强使用域快照的时间序列,以及(2)通过在不同TLD之间传输模型来进行协作。因此,相应的质询任务可能包括分发爬网快照数据以生成功能,以及对相同或什至不同TLD的未标记新爬网进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号