首页> 外文会议>Machine learning and data mining in pattern recognition >New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps
【24h】

New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps

机译:基于扩展语义重叠的基于语义相似度的新文本聚类模型

获取原文
获取原文并翻译 | 示例

摘要

Most text clustering techniques are based on words and/or phrases weights in the text. Such representation is often unsatisfactory because it ignores the relationships between terms, and considers them as independent features.rnIn this paper, a new semantic similarity based model (SSBM) is proposed. The semantic similarity based model computes semantic similarities by utilizing WordNet as an ontology. The proposed model captures the semantic similarities between documents that contain semantically similar terms but unnecessarily syntactically identical.rnThe semantic similarity based model assigns a new weight to document terms reflecting the semantic relationships between terms that co-occur literally in the document. Our model in conjunction with the extended gloss overlaps measure and the adapted Lesk algorithm solves ambiguity, synonymy problems that are not detected using traditional term frequency based text mining techniques.rnThe proposed model is evaluated on the Reuters-21578 and the 20-Newsgroups text collections datasets. The performance is assessed in terms of the Fmeasure, Purity and Entropy quality measures. The obtained results show promising performance improvements compared to the traditional term based vector space model (VSM) as well as other existing methods that include semantic similarity measures in text clustering.
机译:大多数文本聚类技术都是基于文本中单词和/或短语的权重。这样的表示常常不能令人满意,因为它忽略了术语之间的关系,并认为它们是独立的特征。本文提出了一种新的基于语义相似度的模型(SSBM)。基于语义相似度的模型通过使用WordNet作为本体来计算语义相似度。所提出的模型捕获了包含语义相似的术语但在语法上不必要地相同的文档之间的语义相似性。基于语义相似性的模型为文档术语赋予了新的权重,以反映在文档中逐字出现的术语之间的语义关系。我们的模型与扩展的光泽重叠量度和经过改进的Lesk算法相结合,解决了使用基于传统术语频率的文本挖掘技术无法检测到的歧义,同义词问题.rn建议的模型在Reuters-21578和20-Newsgroups文本集合上进行评估数据集。根据F措施,纯度和熵质量措施对性能进行评估。与传统的基于术语的向量空间模型(VSM)以及其他现有方法(包括文本聚类中的语义相似性度量)相比,所获得的结果显示出令人鼓舞的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号