...
首页> 外文期刊>International Journal of Applied Engineering Research >Semantically Document Clustering Using Contextual Similarities
【24h】

Semantically Document Clustering Using Contextual Similarities

机译:使用上下文相似性语义文档聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Efficient Document clustering can be performed based on the term level, sentence level and concept level techniques in the high dimensional document space. Most of the existing techniques have problems such as two-variable problem, high computational time and low similarity relatedness which reduces the clustering efficiency. To overcome the existing drawbacks, a hybrid clustering algorithm called Semantically Document Clustering algorithm is proposed in this paper. The Semantically Document Clustering algorithm is developed by combining the features of Directed Ridge Regression (DRR), Fuzzy relational Hierarchical clustering (FHC) and Conceptual clustering methods presented in our previous researches. The proposed Semantically Document Clustering algorithm utilizes the semantic weight of terms related to the concepts from Wikipedia and Word Net to categorize the texts in the documents. Then the similarity between the sentences is calculated by using the Jiang and Conrath measure which considers the concept weight and the similarity measure for effective clustering. The direct ridge regression is applied to build a Laplacian matrix and the diagonal elements of the normalized Laplacian matrix are varied to solve the two-variable problem. Then the fuzzy hierarchical rules are employed to classify the rows of the normalized Laplacian matrix into classes for calculating the membership for the observations and the center vectors. Thus the term relatedness, sentence relatedness and concept relatedness can be calculated and the documents can be clustered efficiently. Experiment results also show that the proposed hybrid approach Semantically Document Clustering method provides more accurate document clustering than the state-of-the-art clustering methods.
机译:可以基于高维文档空间中的术语级别,句子级别和概念级别技术来执行有效的文档聚类。现有技术大多存在二变量问题,计算时间长,相似度相关性低等问题,降低了聚类效率。为了克服现有的缺点,提出了一种称为语义文档聚类的混合聚类算法。语义文档聚类算法是结合先前研究中提出的定向岭回归(DRR),模糊关系层次聚类(FHC)和概念聚类方法的特点而开发的。提出的语义文档聚类算法利用与Wikipedia和Word Net中的概念相关的术语的语义权重来对文档中的文本进行分类。然后,使用Jiang和Conrath度量来计算句子之间的相似度,该度量考虑了概念权重和有效聚类的相似性度量。应用直接岭回归建立拉普拉斯矩阵,并改变归一化拉普拉斯矩阵的对角元素,以解决二变量问题。然后,采用模糊层次规则将归一化的拉普拉斯矩阵的行分类为用于计算观测值和中心向量的隶属关系的类。因此,可以计算术语相关性,句子相关性和概念相关性,并且可以有效地对文档进行聚类。实验结果还表明,提出的混合方法语义文档聚类方法比最新的聚类方法提供了更准确的文档聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号