首页> 外文会议>IEEE Conference on Open Systems >Malay document clustering using complete linkage clustering technique with Cosine Coefficient
【24h】

Malay document clustering using complete linkage clustering technique with Cosine Coefficient

机译:使用具有余弦系数的完整链接聚类技术进行马来文档聚类

获取原文

摘要

Finding useful and relevant information is a very challenging task to the user. The retrieval system usually responded with a long listed documents which are not necessarily relevant to the user's need. Document clustering is a special technique that can sort out the documents effectively so that documents in the same cluster are similar to each other and documents in different cluster are dissimilar to each other. This paper focuses on document clustering for Malay test collection. It consists of 2028 Malay translated Hadith documents from book Sahih Bukhari. This paper presents the results using Complete Linkage Clustering algorithm with Cosine Coefficient on Malay translated Hadith documents. The evaluation of the experiments uses Recall (R), Precision (P) and Effectiveness (E) measure. The experiments is conducted on 100 clusters, 50 clusters and 20 clusters. It shows that the smaller the size of clusters, Recall (R) will increase, but Precision (P) will decrease. Results for Effectiveness (E) measure compared to the non-clustered documents show that applying clustering algorithm will improved the effectiveness of searching process. For this experiment 20 clusters is rather effective compared to the others.
机译:对于用户而言,找到有用且相关的信息是一项非常艰巨的任务。检索系统通常会以列出较长的文档作为响应,这些文档不一定与用户的需求相关。文档聚类是一种特殊的技术,可以有效地对文档进行分类,以使同一聚类中的文档彼此相似,而不同聚类中的文档彼此不相似。本文重点介绍用于马来语测试集合的文档聚类。它由2028年马来语翻译的《圣训》(Sahih Bukhari)书中的圣训文档组成。本文介绍了使用具有余弦系数的完全链接聚类算法对马来语翻译过的Hadith文档的结果。实验评估使用召回率(R),精确度(P)和有效性(E)度量。实验在100个群集,50个群集和20个群集上进行。它表明,簇的大小越小,调用(R)将增加,但是精度(P)将降低。与非聚类文档相比,有效性(E)度量的结果表明,应用聚类算法将提高搜索过程的有效性。对于此实验,与其他集群相比,20个集群非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号