Knowledge-based vector space model for text clustering

Liping Jing; Michael K. Ng; Joshua Z. Huang

首页> 外文期刊>Knowledge and information systems >Knowledge-based vector space model for text clustering

【24h】

Knowledge-based vector space model for text clustering

机译：基于知识的文本聚类向量空间模型

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a new knowledge-based vector space model (VSM) for text clustering. In the new model, semantic relationships between terms (e.g. words or concepts) are included in representing text documents as a set of vectors. The idea is to calculate the dissimilarity between two documents more effectively so that text clustering results can be enhanced. In this paper, the semantic relationship between two terms is defined by the similarity of the two terms. Such similarity is used to re-weight term frequency in the VSM. We consider and study two different similarity measures for computing the semantic relationship between two terms based on two different approaches. The first approach is based on the existing ontologies like WordNet and MeSH. We define a new similarity measure that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy. The second approach is to make use of text corpora to construct the relationships between terms and then calculate their semantic similarities. Three clustering algorithms, bisecting k-means, feature weighting k-means and a hierarchical clustering algorithm, have been used to cluster real-world text data represented in the new knowledge-based VSM. The experimental results show that the clustering performance based on the new model was much better than that based on the traditional term-based VSM.

机译：本文提出了一种新的基于知识的向量空间模型（VSM），用于文本聚类。在新模型中，术语（例如单词或概念）之间的语义关系包括在将文本文档表示为一组向量中。这个想法是为了更有效地计算两个文档之间的差异，从而可以增强文本聚类的结果。在本文中，两个术语之间的语义关系由两个术语的相似性定义。这种相似性用于重新加权VSM中的词频。我们考虑并研究了基于两种不同方法的两种不同的相似性度量，用于计算两个术语之间的语义关系。第一种方法基于WordNet和MeSH等现有本体。我们定义了一种新的相似性度量，该度量结合了边缘计数技术，平均距离和位置加权方法，以从本体层次结构中计算两个术语的相似性。第二种方法是利用文本语料库来构建术语之间的关系，然后计算它们的语义相似度。三种聚类算法（二等分k均值，特征权重k均值和分层聚类算法）已被用于聚类新的基于知识的VSM中表示的真实世界文本数据。实验结果表明，基于新模型的聚类性能明显优于基于传统术语的VSM。

著录项

来源
《Knowledge and information systems》 |2010年第1期|共21页
作者
Liping Jing; Michael K. Ng; Joshua Z. Huang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统理论;
关键词
Text clustering; Knowledge-based VSM; Term similarity; Semantic relationship;

机译：文本聚类;基于知识的VSM;术语相似度;语义关系;

相似文献

外文文献
中文文献
专利

1. Knowledge-based vector space model for text clustering [J] . Liping Jing, Michael K. Ng, Joshua Z. Huang Knowledge and information systems . 2010,第1期

机译：基于知识的文本聚类向量空间模型
2. SEMANTIC TEXT CLUSTERING USING ENHANCED VECTOR SPACE MODEL USING NEPALI LANGUAGE [J] . Chiranjibi Sitaula Computer Sciences and Telecommunications . 2012,第4期

机译：使用NEPALI语言的增强矢量空间模型进行语义文本聚类
3. Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques [J] . Abdisa Demissie Amensisa New Media and Mass Communication . 2020,第4期

机译：基于新型融合技术的基于增强句子矢量空间模型和双革文本表示模型的文本文档分类
4. Cluster Vector Space Model: A Dimensionality Reduction Method for Text Classifications Based on the Vector Quantization [C] . Juxihong Julaiti, Soundar Kumara Industrial and Systems Engineering Conference . 2017

机译：群集矢量空间模型：基于矢量量化的文本分类的维度减少方法
5. Text clustering and active learning using a LSI subspace signature model and query expansion. [D] . Zhu, Weizhong. 2009

机译：使用LSI子空间签名模型和查询扩展进行文本聚类和主动学习。
6. Towards Semantically Sensitive Text Clustering: A Feature Space Modeling Technology Based on Dimension Extension [O] . Yuanchao Liu, Ming Liu, Xin Wang -1

机译：面向语义敏感的文本聚类：基于维扩展的特征空间建模技术
7. Evaluation Of The Vector Space Representation In Text-Based Gene Clustering [O] . P. Glenisson, P. Antal, J. Mathys, 2003

机译：基于文本的基因聚类中向量空间表示的评估

Knowledge-based vector space model for text clustering

摘要

著录项

相似文献

相关主题

期刊订阅