Novel frequent sequential patterns based probabilistic model for effective classification of web documents

机译：基于新型频繁序列模式的概率模型用于Web文档的有效分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web page classification has been one of essential tasks in web information retrieval such as delivering content specific search results, focused crawling and maintaining web-directory projects like DMOZ, etc. This paper presents a novel probabilistic web page classification scheme that utilizes the occurrences of frequent sequential patterns to determine the class of the document. As being suggested by many previous works in the field of text mining, patterns possess more relevant information about the document than individual words. This paper is an attempt to successfully make use of this hypothesis for classification of web documents. After testing this novel approach on RCV1 dataset, we were able to obtain classify the test documents with 88% accuracy.

机译：网页分类一直是Web信息检索中的基本任务之一，例如提供特定内容的搜索结果，集中抓取和维护Web目录项目（如DMOZ等）。本文提出了一种新颖的概率网页分类方案，该方案利用了频繁出现的情况确定文档类的顺序模式。正如以前在文本挖掘领域的许多著作所建议的那样，模式比单个单词拥有更多有关文档的信息。本文是试图成功地利用该假设对Web文档进行分类的尝试。在RCV1数据集上测试了这种新颖的方法之后，我们能够以88％的准确度对测试文档进行分类。

著录项

来源
《International Conference on Computer and Communication Technology》|2014年|361-371|共11页
会议地点
作者
Haleem Hammad; Sharma Praveen Kumar; Sufyan Beg M.M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Internet; classification; data mining; information retrieval; probability; RCV1 dataset; Web document classification; Web information retrieval; Web page classification; frequent sequential pattern; probabilistic model; text mining; Abstracts; Accuracy; Probabilistic logic; Testing; Text mining; Web pages;

机译：互联网;分类;数据挖掘;信息检索;概率; RCV1数据集; Web文档分类; Web信息检索; Web页面分类;频繁顺序模式;概率模型;文本挖掘;摘要;准确性;概率逻辑;测试;文本挖掘; Web页数;

相似文献

外文文献
中文文献
专利

1. Extraction of Frequent Sequential Patterns From Web Usage Data and Their Applications In Pre-Fetching Rules Generation For Effective Web Latency Reduction [J] . Badong Chen, Yueqin Zhu Advances in applied computational mechanics . 2018,第1期

机译：提取Web使用数据的频繁顺序模式及其在预取规则生成中的应用程序，以实现有效的Web等待时间
2. Extraction of Frequent Sequential Patterns From Web Usage Data and Their Applications In Pre-Fetching Rules Generation For Effective Web Latency Reduction [J] . Nooredin Ghadiri Massoom Advances in applied computational mechanics . 2017,第1期

机译：提取Web使用数据的频繁顺序模式及其在预取规则生成中的应用程序，以实现有效的Web等待时间
3. Effective Temporal Data Classification By Integrating Sequential Pattern Mining And Probabilistic Induction [J] . Vincent S. Tseng, Chao-Hui Lee Expert systems with applications . 2009,第5期

机译：通过整合顺序模式挖掘和概率归纳法进行有效的时间数据分类
4. Novel frequent sequential patterns based probabilistic model for effective classification of web documents [C] . Haleem Hammad, Sharma Praveen Kumar, Sufyan Beg M.M. International Conference on Computer and Communication Technology . 2014

机译：基于新型常见顺序模式的Web文档有效分类的概率模型
5. Towards accurate and efficient classification: A discriminative and frequent pattern-based approach. [D] . Cheng, Hong. 2008

机译：朝着准确有效的分类迈进：一种基于判别性且基于模式的频繁方法。
6. Social mixing patterns for transmission models of close contact infections: exploring self-evaluation and diary-based data collection through a web-based interface [O] . P. BEUTELS, Z. SHKEDY, M. AERTS, 2006

机译：紧密接触感染传播模型的社会混合模式：通过基于Web的界面探索自我评估和基于日记的数据收集
7. A Neoteric Web Recommender System based on Approach of Mining Frequent Sequential Pattern from Customized Web Log Preprocessing [O] . Manisha Valera, Kirit Rathod, Uttam Chauhan 2014

机译：基于自定义Web日志预处理的频繁序列模式挖掘方法的近代Web推荐系统
8. SLPMiner: An Algorithm for Finding Frequent Sequential Patterns Using Length-Decreasing Support Constraint [R] . Seno, M. , Karypis, G. 2002

机译：sLpminer：一种利用长度减小支持约束寻找频繁序列模式的算法

Novel frequent sequential patterns based probabilistic model for effective classification of web documents

摘要

著录项

相似文献

相关主题

期刊订阅