An effective approach for semantic-based clustering and topic-based ranking of web documents

Rajendra Kumar Roul

首页> 外文期刊>International Journal of Data Science and Analytics >An effective approach for semantic-based clustering and topic-based ranking of web documents

【24h】

An effective approach for semantic-based clustering and topic-based ranking of web documents

机译：Web文档基于语义的聚类和基于主题的排名的有效方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this large, dynamic and expandable web, extracting desired information of any user query is a significant problem for the search engine. Clustering and Ranking are two important resources which can shed light in this direction. To achieve this potential clustering-ranking mechanism, this study proposes a combined approach of semantic-based clustering and topic-based ranking of web documents. The proposed clustering approach combines the latent semantic indexing (LSI) with min-cut algorithm. To make the clustering technique more effective, a new feature selection method called clustering-based feature selection has been developed that focuses on finding the feature set which gathers the crux of documents in the corpus without deteriorating the outcome of the construction process. While LSI completely overcomes the constraint of synonymy, the min-cut algorithm helps to generate efficient clusters at each stage of the clustering process. For deciding the number of clusters to be formed, silhouette coefficient is used, which is a parameter incorporating both cohesion and separation of clusters. To rank the documents in each semantic cluster, the proposed approach transforms the text into topics using latent Dirichlet allocation and then runs the inverted indexing technique on those topics. 20-Newsgroups and DMOZ datasets are used for experimental work, and the results obtained from the experiment show that the performance of the clustering approach is better than the traditional clustering approaches and the ranking approach is promising.

机译：在这个庞大，动态且可扩展的网络中，提取任何用户查询的所需信息对于搜索引擎而言是一个重大问题。聚类和排名是可以朝这个方向阐明的两个重要资源。为了实现这种潜在的聚类排名机制，本研究提出了一种基于语义的聚类和基于主题的Web文档排名的组合方法。提出的聚类方法结合了潜在语义索引（LSI）和最小割算法。为了使聚类技术更有效，已经开发了一种称为基于聚类的特征选择的新特征选择方法，该方法着眼于寻找在不降低构造过程结果的情况下收集语料库中关键点的特征集。 LSI完全克服了同义性的限制，而最小割算法有助于在聚类过程的每个阶段生成有效的聚类。为了确定要形成的簇的数量，使用了轮廓系数，其为结合了簇的内聚力和分离力的参数。为了对每个语义簇中的文档进行排名，所提出的方法使用潜在的狄利克雷分配将文本转换为主题，然后对这些主题运行反向索引技术。 20-Newsgroups和DMOZ数据集用于实验工作，从实验中获得的结果表明，聚类方法的性能优于传统聚类方法，并且排序方法很有希望。

著录项

来源
《International Journal of Data Science and Analytics》 |2018年第4期|269-284|共16页
作者
Rajendra Kumar Roul;
展开▼
作者单位

Department of Computer Science and Information Systems, BITS, Pilani-K.K.Birla Goa Campus, Zuarinagar, Goa 403726, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Correlation; Inverted indexing; Latent Dirichlet allocation; Latent semantic indexing; Min-cut; Silhouette coefficient;

机译：相关性倒排索引;潜在Dirichlet分配;潜在语义索引;最小切轮廓系数;

相似文献

外文文献
中文文献
专利

1. An Efficient Approach for Ranking of Semantic Web Documents by Computing Semantic Similarity and Using HCS Clustering [J] . Poonam Chahal, Manjeet Singh International journal of signs and semiotic systems . 2021,第1期

机译：通过计算语义相似性和使用HCS群集来进行语义Web文档的高效方法
2. A novel approach of cluster based optimal ranking of clicked URLs using genetic algorithm for effective personalized web search [J] . Chawla Suruchi Applied Soft Computing . 2016,第Null期

机译：一种基于聚类的点击URL的最佳排序的新方法，该算法使用遗传算法进行有效的个性化Web搜索
3. A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results [J] . Takazumi Matsumoto, Edward Hung Journal of Intelligent Information Systems . 2012,第2期

机译：基于转导的网络搜索结果模糊聚类，相关性排名和聚类标签生成方法
4. An effective implementation of Social Spider Optimization for text document clustering using single cluster approach [C] . T. Ravi Chandran, A.V. Reddy, B. Janet 2018 Second International Conference on Inventive Communication and Computational Technologies . 2018

机译：使用单聚类方法对文本文档聚类有效实施Social Spider Optimization
5. Increasing trustworthiness in Web-page searches by using an alternative approach for Web-page ranking. [D] . Parrell, Daniel J. 2008

机译：通过使用另一种网页排名方法，可以提高网页搜索的可信度。
6. A Document Clustering and Ranking System for Exploring MEDLINE Citations [O] . Yongjing Lin, Wenyuan Li, Keke Chen, 2007

机译：用于探索MEDLINE引文的文档聚类和排名系统
7. A Semantic Web Approach for Improving Ranking Model of Web Document [O] . G. Charles Babu, Pvrd Prasada, Rao N. Sandhya, 2011

机译：一种改进Web文档排名模型的语义Web方法
8. Heuristic Ranking and Diversification of Web Documents [R] . He, J., Balog, K., Hofmann, K., 2009

机译：Web文档的启发式排序与多样化

An effective approach for semantic-based clustering and topic-based ranking of web documents

摘要

著录项

相似文献

相关主题

期刊订阅