首页> 外文会议>Annual Workshop on Semantic Web and Ontology(SWON2006) >Using ontology semantics to improve text documents clustering
【24h】

Using ontology semantics to improve text documents clustering

机译:使用本体语义来改进文本文档群集

获取原文

摘要

In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing is proposed. First, every word vector by new entities is extended after the documents are represented by tf-idf. Then the feature extracting algorithm is applied for the documents. Finally, the algorithm of ontology aggregation clustering (OAC) is proposed to improve the result of document clustering. Experiments are based on the data set of Reuters 20 News Group, and experimental results are compared with the results obtained by mutual information (MI). The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB, CLUTO, co-clustering, etc.
机译:为了改善群集结果并选择结果,本体语义与文档聚类组合。 提出了一种新的文档群集算法,在文档处理短语中的Wordnet。 首先,在文档由TF-IDF表示后,新实体的每个单词矢量都会扩展。 然后应用于文档的特征提取算法。 最后,提出了本体群集聚合群集(OAC)的算法来改进文档聚类的结果。 实验基于路透社20新闻组的数据集,并将实验结果与通过相互信息(MI)获得的结果进行比较。 结论借鉴了基于本体的文档聚类算法优于其他存在的聚类算法,例如MNB,CLUTO,共聚类等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号