首页> 外文会议>International Symposium on Computers and Communications >WCOND-Mine: Algorithm for Detecting Web Content Outliers from Web Documents
【24h】

WCOND-Mine: Algorithm for Detecting Web Content Outliers from Web Documents

机译:Wcond-ine:从Web文档中检测Web内容异常值的算法

获取原文

摘要

Outlier mining is dedicated to finding data objects which differ significantly from the rest of the data. Outlier mining has been extensively studied in statistics and recently data mining. However, exploring the web for outliers has received very little attention in the mining community. Web content outliers are documents with 'varying contents' compared to similar web documents taken from the same domain. Mining web content outliers may lead to the identification of competitors and emerging business patterns in electronic commerce. This paper proposes WCOND-Mine algorithm for mining web content outliers using n-grams without a domain dictionary. Experimental results with embedded motifs show that WCOND-Mine is capable of finding web content outliers from web datasets.
机译:异常挖掘专用于查找从数据的其余部分显着不同的数据对象。在统计数据和最近的数据挖掘中,广泛研究了异常矿业。然而,探索网站的异常值在矿业社区中受到很少的关注。与从同一域中取出的类似Web文档相比,Web内容异常值是具有“变化内容”的文档。采矿网内容异常值可能导致电子商务中竞争对手的识别和新兴业务模式。本文提出了在没有域词典的情况下使用n-gram挖掘Web内容异常值的Wcond-ine算法。具有嵌入式图案的实验结果表明,Wcond-Mine能够从Web数据集找到Web内容异常值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号