首页> 外文会议>International Conference on Information Knowledge Engineering >String Vector based AHC Algorithm for Word Clustering from News Articles
【24h】

String Vector based AHC Algorithm for Word Clustering from News Articles

机译:基于串向量的AHC算法来自新闻文章的Word群集

获取原文
获取外文期刊封面目录资料

摘要

In this research, we propose the AHC version where words are encoded into string vectors, instead of numerical vectors, as the approach to the word clustering. In the previous works, the better text categorization performances from encoding texts into string vectors than into numerical vectors are shown and we need to reinforce both the word clustering and the text clustering by connecting them with each other. In this research, words are encoded into string vectors each of which consists of text identifiers, the semantic similarity between two string vectors is defined as the operation on string vectors, and the AHC algorithm is modified by adopting the proposed similarity metric. We adopt the clustering index as the evaluation metric, and validate empirically that the proposed AHC version is better than the traditional version. In future, we connect mutually the word clustering with the text clustering for reinforcing them at same time.
机译:在这项研究中,我们提出了将单词编码为串向量的AHC版本,而不是数值向量,作为单词聚类的方法。在上一个作品中,显示了将文本的更好的文本分类表现为串向量而不是在数值向量中,我们需要通过彼此连接它们来加强单词聚类和文本群集。在本研究中,单词被编码为串联载体,其中由文本标识符组成,两个串向量之间的语义相似度被定义为串向量的操作,并且通过采用所提出的相似度量来修改AHC算法。我们采用聚类指数作为评估度量,并验证验证所提出的AHC版本优于传统版本。在将来,我们将与文本聚类相互连接,以便同时加强它们。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号