首页> 外文期刊>Annals of Mathematics and Artificial Intelligence >Semantic string operation for specializing AHC algorithm for text clustering
【24h】

Semantic string operation for specializing AHC algorithm for text clustering

机译:专用AHC算法的语义字符串操作

获取原文
获取原文并翻译 | 示例
           

摘要

This article proposes the modified AHC (Agglomerative Hierarchical Clustering) algorithm which clusters string vectors, instead of numerical vectors, as the approach to the text clustering. The results from applying the string vector based algorithms to the text clustering were successful in previous works and synergy effect between the text clustering and the word clustering is expected by combining them with each other; the two facts become motivations for this research. In this research, we define the operation on string vectors called semantic similarity, and modify the AHC algorithm by adopting the proposed similarity metric as the approach to the text clustering. The proposed AHC algorithm is empirically validated as the better approach in clustering texts in news articles and opinions. We need to define and characterize mathematically more operations on string vectors for modifying more advanced machine learning algorithms.
机译:本文提出了修改的AHC(附上分层聚类)算法,其簇串向量,而不是数值向量,作为文本聚类的方法。将基于串向量的算法应用于文本群集的结果是成功的,在先前的作品中,并通过彼此组合它们来实现文本聚类和单词聚类之间的协同效果;两个事实成为这项研究的动机。在这项研究中,我们通过将所提出的相似度量作为文本群集的方法采用建议的相似度量来定义对语义相似度的字符串向量的操作,并通过作为文本聚类的方法来修改AHC算法。所提出的AHC算法经验验证为新闻文章和意见中的聚类文本中的更好方法。我们需要在数学上定义和表征在数串向量上的操作,以修改更高级的机器学习算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号