首页> 外文会议>International Conference on Computer and Communication Technology >Novel frequent sequential patterns based probabilistic model for effective classification of web documents
【24h】

Novel frequent sequential patterns based probabilistic model for effective classification of web documents

机译:基于新型频繁序列模式的概率模型用于Web文档的有效分类

获取原文

摘要

Web page classification has been one of essential tasks in web information retrieval such as delivering content specific search results, focused crawling and maintaining web-directory projects like DMOZ, etc. This paper presents a novel probabilistic web page classification scheme that utilizes the occurrences of frequent sequential patterns to determine the class of the document. As being suggested by many previous works in the field of text mining, patterns possess more relevant information about the document than individual words. This paper is an attempt to successfully make use of this hypothesis for classification of web documents. After testing this novel approach on RCV1 dataset, we were able to obtain classify the test documents with 88% accuracy.
机译:网页分类一直是Web信息检索中的基本任务之一,例如提供特定内容的搜索结果,集中抓取和维护Web目录项目(如DMOZ等)。本文提出了一种新颖的概率网页分类方案,该方案利用了频繁出现的情况确定文档类的顺序模式。正如以前在文本挖掘领域的许多著作所建议的那样,模式比单个单词拥有更多有关文档的信息。本文是试图成功地利用该假设对Web文档进行分类的尝试。在RCV1数据集上测试了这种新颖的方法之后,我们能够以88%的准确度对测试文档进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号