首页> 外文会议>Asia Information Retrieval Symposium(AIRS 2005); 20051013-15; Jeju Island(KR) >On the Chinese Document Clustering Based on Dynamical Term Clustering
【24h】

On the Chinese Document Clustering Based on Dynamical Term Clustering

机译:基于动态术语聚类的中文文档聚类

获取原文
获取原文并翻译 | 示例

摘要

With the rapid development of global networking through the network, more and more information is accessible on-line. It makes the document clustering technique more dispensable. With the clustering process we can efficiently browse the large information. In this paper, we focus on Chinese document clustering process, which uses data mining technique and neural network model. There are two main phases : preprocessing phase and clustering phase. In the preprocessing phase, we propose another Chinese sentence segmentation method, which based on data mining technique of using a hash-based method. In the clustering phase, we adopt the dynamical SOM model with a view to dynamically clustering data. Furthermore, we use term vectors clustering process instead of document vectors clustering process. Our experiments demonstrate that the term clustering results in better precision rate, and the term clustering will be more efficiently when the amount of documents grows gradually.
机译:随着通过网络的全球联网的快速发展,越来越多的信息可在线访问。它使文档聚类技术更加可有可无。通过聚类过程,我们可以有效地浏览大量信息。在本文中,我们重点研究中文文档聚类过程,该过程使用数据挖掘技术和神经网络模型。有两个主要阶段:预处理阶段和聚类阶段。在预处理阶段,我们提出了另一种中文句子分割方法,该方法基于使用基于哈希的方法的数据挖掘技术。在聚类阶段,我们采用动态SOM模型以动态地聚类数据。此外,我们使用术语向量聚类过程代替文档向量聚类过程。我们的实验表明,术语聚类可以提高准确率,并且当文档数量逐渐增长时,术语聚类将更加有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号