A Novel Technique for Web Pages Clustering Using LSA and K-Medoids Algorithm

机译：使用LSA和K-METOIDS算法的网页聚类新技术

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The extensibility of various web documents available on the web made a critical challenge for many serious tasks such as information retrieval (IR), content monitoring, and indexing. Web documents could be any type of data that can be requested by user and delivered from web server through several web browsers. Most of web documents contain textual contents and are typically called web pages. However, in order to perceive and discover knowledge from these pages, novel techniques are required that have been never applied in other domains. In this paper, a new approach has been proposed by performed latent semantic analysis (LSA) on the result of VSM, which involves the correlation among web pages to their extracted features. The result of LSA involves the matrices that reflect the correlation between the web pages to their related concepts, which were used frequently for retrieving process. PAM (K-Medoids) algorithm was used with respect to semantic space, to portion the web pages into coherent groups. One of the most challenges in any clustering algorithm is to identify the correct number of clusters for the given data. Hence, two approaches are used for this manner: Elbow graph analysis to estimate the number of cluster range based on (SSE) values and clustering evaluation metrics. Calinski-Harabasz criterion (CH) and Silhouette Coefficient (SC) are the best well-known evaluation metrics commonly used in partitioning-based algorithms. UOT has been considered to evaluate the proposed system, and the results are shown in the proposed system to achieve high accuracy results to separate the similar pages into coherent groups.

机译：网络上可用的各种Web文档的可扩展性对许多严重任务（如信息检索（IR），内容监控和索引）进行了危急挑战。 Web文档可以是任何类型的数据，可以由用户请求并通过多个Web浏览器从Web服务器传递。大多数Web文档包含文本内容，通常称为网页。然而，为了从这些页面感知和发现知识，需要新颖的技术，从未应用于其他域。在本文中，已经通过对VSM结果进行了潜在的语义分析（LSA）来提出了一种新方法，这涉及网页之间的相关性与其提取的特征。 LSA的结果涉及反映网页与其相关概念之间的相关性的矩阵，其经常用于检索过程。 PAM（K-METOIDS）算法与语义空间一起使用，将网页分成相干组。任何聚类算法中最多的挑战之一是为给定数据识别正确数量的群集。因此，两种方法用于这种方式：弯头图分析，以估计基于（SSE）值和聚类评估度量的集群范围的数量。 Calinski-Harabasz标准（CH）和轮廓系数（SC）是基于分区算法中的最佳知名评估度量。已经考虑了uot评估所提出的系统，结果显示在所提出的系统中，以实现高精度的结果，以将类似的页面分离成相干群体。

著录项

来源
《International Conference on Research in Intelligent and Computing in Engineering》|2020年|xx 579-1204 pages :|共9页
会议地点
作者
Nora Omran Alkaam; Noor A. Neamah; Faris Sahib Al-Rammahi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.4-532;
关键词
Clustering; Web mining; Data mining; Web content mining; PAM; K-Medoids; Silhouette coefficient; Calinski-Harabasz criterion;

机译：聚类;网站挖掘;数据挖掘;网上内容挖掘;PAM;k-yemoids;剪影系数;calinski-harabasz标准;

相似文献

外文文献
中文文献
专利

1. IMPROVING CUSTOMER CLUSTERING BY OPTIMAL SELECTION OF CLUSTER CENTROIDS IN K-MEANS AND K-MEDOIDS ALGORITHMS [J] . SHAHLA MOUSAVI, FARSAD ZAMANI BOROUJENI, SAEED ARYANMEHR Journal of Theoretical and Applied Information Technology . 2020,第18期

机译：通过在K-Means和K-METOIDS算法中最佳选择通过最佳选择来改善客户聚类
2. A novel landslide susceptibility mapping portrayed by OA-HD and K-medoids clustering algorithms [J] . Hu Jian, Xu Kaibin, Wang Genglong, Bulletin of engineering geology and the environment . 2021,第2期

机译：OA-HD和K-METOIDS聚类算法描绘的新型滑坡易感性映射
3. Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms [J] . Schubert Erich, Rousseeuw Peter J. Information Systems . 2021,第Nova期

机译：快速和急于k-medoids聚类：o（k）帕姆，克拉拉和clarans算法的运行时间改进
4. A Novel Technique for Web Pages Clustering Using LSA and K-Medoids Algorithm [C] . Nora Omran Alkaam, Noor A. Neamah, Faris Sahib Al-Rammahi International Conference on Research in Intelligent and Computing in Engineering . 2020

机译：使用LSA和K-METOIDS算法的网页聚类新技术
5. Clustering Students' Metacognitive Beliefs: Comparing the Results of K-Means and K-Medoids Algorithms [D] . Bukoski, Elizabeth 2018

机译：聚类学生的元认知信念：比较K-Means和K-Medoids算法的结果
6. Clustering and Characterization of the Lactation Curves of Dairy Cows Using K-Medoids Clustering Algorithm [O] . Mingyung Lee, Seonghun Lee, Jaehwa Park, 2020

机译：k-meyoids聚类算法使用乳制奶牛哺乳曲线的聚类与表征
7. Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids [O] . Edy Irwansyah, Ebiet Salim Pratama, Margaretha Ohyver 2020

机译：使用具有主成分分析和K-yemoids的数据挖掘患者使用数据挖掘技术的心血管疾病患者使用数据挖掘技术和K-MEDOIDS患者K-MEDOIDS患者

A Novel Technique for Web Pages Clustering Using LSA and K-Medoids Algorithm

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅