首页> 外文期刊>Journal of Residuals Science & Technology >Duplication Deletion Mechanism of Web Search Engine Based on Multi View Canonical Correlation Analysis
【24h】

Duplication Deletion Mechanism of Web Search Engine Based on Multi View Canonical Correlation Analysis

机译:基于多视图典范相关分析的Web搜索引擎重复删除机制

获取原文
       

摘要

Spam is the important problem that may be faced in information retrieval field, which seriously affects the user to obtain relevant information. First to divide spam website features into two different views, views based on the content features and views based on link features, by using canonical correlation analysis and the related improved methods for feature extraction to generate two sets of new features. Then it adopts different combinations on the two-view new features to produce single view data, and uses this set of data as the training data to construct the classification algorithm. The experimental results show that the spam web can be regarded as two-view data, and applied in multiple-view canonical correlation analysis technology, which can effectively improve the identification precision of the spam web.
机译:垃圾邮件是信息检索领域可能面临的重要问题,严重影响用户获取相关信息。首先,通过使用规范相关分析和相关的改进的特征提取方法来生成两套新特征,将垃圾邮件网站特征分为两个不同的视图:基于内容特征的视图和基于链接特征的视图。然后,它对两视图新特征采用不同的组合以生成单视图数据,并使用该数据集作为训练数据来构建分类算法。实验结果表明,垃圾邮件网可以看作是两视图数据,并应用于多视图规范相关分析技术中,可以有效提高垃圾邮件网的识别精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号