首页> 外文期刊>Journal of computer sciences >SUPERVISED TERM WEIGHTING METHODS FOR URL CLASSIFICATION
【24h】

SUPERVISED TERM WEIGHTING METHODS FOR URL CLASSIFICATION

机译:URL分类的监督术语权重方法

获取原文
获取原文并翻译 | 示例

摘要

Many term weighting methods are suggested in the literature for Information Retrieval and Text Categorization. Term weighting method, a part of feature selection process is not yet explored for URL classification problem. We classify a web page using its URL alone without fetching its content and hence URL based classification is faster than other methods. In this study, we investigate the use of term weighting methods for selecting relevant URL features and their impact on the performance of URL classification. We propose a New Relevance Factor (NRF) for the supervised term weighting method to compute the URL weights and perform multiclass classification of URLs using Naive Bayes Classifier. To evaluate the proposed method, we have conducted various experiments on ODP dataset and our experimental results show that the proposed supervised term weighting method based on NRF is suitable for URL classification. We have achieved 11% improvement in terms of Precision over the existing binary classifier methods and 22% improvement in terms of Fl when compared with existing multiclass classifiers.
机译:在文献中,为信息检索和文本分类建议了许多术语加权方法。术语权重方法是特征选择过程的一部分,尚未针对URL分类问题进行探讨。我们仅使用URL对其进行分类而不获取其内容,因此基于URL的分类比其他方法更快。在这项研究中,我们调查了术语加权方法用于选择相关URL功能及其对URL分类性能的影响。我们为监督术语加权方法提出了一种新的相关因子(NRF),以使用Naive Bayes分类器计算URL权重并执行URL的多类分类。为了评估该方法,我们对ODP数据集进行了各种实验,实验结果表明,基于NRF的监督术语加权方法适合URL分类。与现有的多分类器相比,我们的精度比现有的二进制分类器方法提高了11%,而Fl的改进了22%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号