首页> 外文会议>International Conference on Advanced Science and Engineering >Clustering Documents based on Semantic Similarity using HAC and K-Mean Algorithms
【24h】

Clustering Documents based on Semantic Similarity using HAC and K-Mean Algorithms

机译:使用HAC和K-MEAL算法基于语义相似性的聚类文档

获取原文

摘要

The continuing success of the Internet has greatly increased the number of text documents in electronic formats. The techniques for grouping these documents into meaningful collections have become mission-critical. The traditional method of compiling documents based on statistical features and grouping did use syntactic rather than semantic. This article introduces a new method for grouping documents based on semantic similarity. This process is accomplished by identifying document summaries from Wikipedia and IMDB datasets, then deriving them using the NLTK dictionary. A vector space afterward is modeled with TFIDF, and the clustering is performed using the HAC and K-mean algorithms. The results are compared and visualized as an interactive webpage.
机译:互联网的持续成功大大增加了电子格式的文本文件的数量。 将这些文档分组到有意义的收集的技术已经成为关键任务。 基于统计特征和分组的传统编译文档的方法确实使用了句法而不是语义。 本文介绍了一种基于语义相似性分组文档的新方法。 该过程是通过识别Wikipedia和IMDB数据集的文件摘要来完成的,然后使用NLTK字典派生它们。 之后的矢量空间用TFIDF进行建模,使用HAC和K平均算法进行群集。 将结果进行比较和可视化为交互式网页。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号