首页> 外文期刊>Journal of Southeast University >Using ontology semantics to improve text documents clustering
【24h】

Using ontology semantics to improve text documents clustering

机译:使用本体语义改善文本文档聚类

获取原文
获取原文并翻译 | 示例
       

摘要

In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing is proposed. First, every word vector by new entities is extended after the documents are represented by tf-idf. Then the feature extracting algorithm is applied for the documents. Finally, the algorithm of ontology aggregation clustering (OAC) is proposed to improve the result of document clustering. Experiments are based on the data set of Reuters 20 News Group, and experimental results are compared with the results obtained by mutual information( MI). The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB, CLUTO, co-clustering, etc.
机译:为了改善聚类结果并选择结果,将本体语义与文档聚类相结合。提出了一种新的基于WordNet的文档聚类算法。首先,在文档由tf-idf表示后,新实体的每个单词向量都将得到扩展。然后将特征提取算法应用于文档。最后,提出了本体聚集聚类(OAC)算法,以提高文档聚类的效果。实验基于路透社20新闻组的数据集,并将实验结果与通过互信息(MI)获得的结果进行比较。结论表明,所提出的基于本体的文档聚类算法优于现有的其他聚类算法,如MNB,CLUTO,co-clustering等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号