首页> 外文会议>International Conference on Computer Recognition Systems >Implicit Links-Based Techniques to Enrich K-Nearest Neighbors and Naive Bayes Algorithms for Web Page Classification
【24h】

Implicit Links-Based Techniques to Enrich K-Nearest Neighbors and Naive Bayes Algorithms for Web Page Classification

机译:基于隐式链接的技术,以丰富K-Collect邻居和Naive Bayes算法进行网页分类

获取原文

摘要

The web has developed into one of the most relevant data sources and becomes now a broad knowledge base for almost all fields. Its content grows faster, and its size becomes larger every day. Due to this big amount of data, web page classification becomes crucial since users encounter difficulties in finding what they are seeking, even though they use search engines. Web page classification is the process of assigning a web page to one or more classes based on previously seen labeled examples. Web pages contain a lot of contextual features that can be used to enhance the classification's accuracy. In this paper, we present a similarity computation technique that is based on implicit links extracted from the query-log, and used with K-Nearest Neighbors (KNN) in web page classification. We also introduce an implicit links-based probability computation method used with Naive Bayes (NB) for web page classification. The new computed similarity and probability help enrich KNN and NB respectively for web page classification. Experiments are conducted on two subsets of Open Directory Project (ODP). Results show that: (1) when applied as a similarity for KNN, the implicit links-based similarity helps improve results. (2) the implicit links-based probability helps ameliorate results provided by NB using only text-based probability.
机译:该网络已开发为最相关的数据源之一,现在几乎所有字段都成为广泛的知识库。它的内容速度更快,其尺寸每天变大。由于这种大量的数据,网页分类变得至关重要,因为用户在找到他们正在寻求的困难时遇到困难,即使他们使用搜索引擎。网页分类是基于先前已标记的示例将网页分配给一个或多个类的过程。网页包含大量的上下文功能,可用于增强分类的准确性。在本文中,我们介绍了一种类似的计算技术,其基于从查询日志中提取的隐式链路,并与网页分类中的K-CORMENT邻居(KNN)一起使用。我们还介绍了一种基于隐式链接的基于链接的概率计算方法,用于网页分类的Naive Bayes(NB)。新的计算相似性和概率分别有助于丰富knn和nb进行网页分类。实验是在一个开放目录项目(ODP)的两个子集上进行的。结果表明:(1)当应用于KNN的相似之处时,基于隐式的链接的相似性有助于改善结果。 (2)基于隐式链接的概率有助于仅使用基于文本的概率提供的NB提供的改善结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号