A Novel Efficient Classification Algorithm for Search Engines

机译：一种用于搜索引擎的新型高效分类算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper a new classification algorithm of Web documents into a set of categories, is proposed. The proposed technique is based on analyzing relationships between different documents and the terms they contain by producing a set of rules relating the category of the document, its terms and their frequencies. Each document is represented by a graph that correlates its most frequent combined words and its category. The relationships among these graphs and the documents' categories are captured. The proposed technique has three phases. The first phase is a training phase where human experts determines the categories of different web pages and articles and the supervised classification algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the blind categorization phase where a web crawler will crawl through the World Wide Web to build a database that will be categorized according to the result of the first phase. This data base contains URLs and their categories. The third phase is applying the proposed graph representation technique on the whole set of documents per category to determine its final graph representation. The third phase will produce better classification rules because the sample size is larger with no additional cost of supervised categorization. Experiments using data sets collected from different Web portals are conducted.

机译：在本文中，提出了一种新的Web文档分类算法到一组类别。所提出的技术基于分析不同文档之间的关系和它们包含的术语通过制作与文档类别的一组规则，其术语及其频率相关。每个文档由图表表示，该图表关联其最常用的组合单词及其类别。捕获这些图表和文档类别之间的关系。所提出的技术有三个阶段。第一阶段是人类专家确定不同网页和文章的类别的培训阶段，并且监督分类算法将根据最常见的单词之间的最高支持的规则将这些类别与适当的加权索引项组合。第二阶段是盲分类阶段，其中Web爬网程序将通过万维网爬行，以构建将根据第一阶段的结果进行分类的数据库。此数据库包含URL及其类别。第三阶段正在每个类别的整组文档上应用所提出的图形表示技术，以确定其最终图表表示。第三阶段将产生更好的分类规则，因为样本大小较大，而没有额外的监督分类成本。使用从不同网站门户收集的数据集进行实验。

著录项

来源
《WSEAS International Conference on Applied Informatics and Communications》|2008年||共7页
会议地点
作者
HANAN AHMED; HOSNI MAHMOUD; ABD ALLA;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Information Processing on the Web; Supervised Classification; Document Classification;

机译：网络上的信息处理;监督分类;文档分类;

相似文献

外文文献
中文文献
专利

1. Software assisted clamping point classification and position optimization for the efficient flexibilization of carbody fixtures using mathematical geometry-based search algorithms [J] . Rayk Fritzsche, Robert Schaffrath, Marcel Todtermuschke Procedia CIRP . 2021,第Suppla1期

机译：软件辅助钳位点分类和位置优化，使用数学几何搜索算法的高效柔性纤维化纤维化
2. Efficient hybrid algorithm based on moth search and fireworks algorithm for solving numerical and constrained engineering optimization problems [J] . Han Xiaoxia, Yue Lin, Dong Yingchao, Journal of supercomputing . 2020,第12期

机译：基于飞蛾搜索和烟花算法的高效混合算法解决数值和约束工程优化问题的算法
3. When search engines stopped being human: menu interfaces and the rise of the ideological nature of algorithmic search [J] . Niels Kerssens Internet Histories . 2017,第3a4期

机译：当搜索引擎不再是人类时：菜单界面和算法搜索意识形态的兴起
4. A Novel Efficient Classification Algorithm for Search Engines [C] . HANAN AHMED, HOSNI MAHMOUD, ABD ALLA WSEAS International Conference on Applied Informatics and Communications . 2008

机译：一种用于搜索引擎的新型高效分类算法
5. Efficient Algorithms for Search Engine Query Processing. [D] . Dimopoulos, Konstantinos. 2016

机译：搜索引擎查询处理的高效算法。
6. Use of administrative and electronic health record data for development of automated algorithms for childhood diabetes case ascertainment and type classification: the SEARCH for Diabetes in Youth Study [O] . Victor W. Zhong, Emily R. Pfaff, Daniel P. Beavers, -1

机译：使用行政和电子健康记录数据开发儿童糖尿病病例确定和类型分类的自动化算法：青少年糖尿病研究
7. Efficient Nearest Neighbor Classification with Data Reduction and Fast Search Algorithms [O] . J. S. Sánchez, J. M. Sotoca, F. Pla 2004

机译：具有数据约简和快速搜索算法的高效最近邻分类

A Novel Efficient Classification Algorithm for Search Engines

摘要

著录项

相似文献

相关主题

期刊订阅