On the Chinese Document Clustering Based on Dynamical Term Clustering

机译：基于动态术语聚类的中文文档聚类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rapid development of global networking through the network, more and more information is accessible on-line. It makes the document clustering technique more dispensable. With the clustering process we can efficiently browse the large information. In this paper, we focus on Chinese document clustering process, which uses data mining technique and neural network model. There are two main phases : preprocessing phase and clustering phase. In the preprocessing phase, we propose another Chinese sentence segmentation method, which based on data mining technique of using a hash-based method. In the clustering phase, we adopt the dynamical SOM model with a view to dynamically clustering data. Furthermore, we use term vectors clustering process instead of document vectors clustering process. Our experiments demonstrate that the term clustering results in better precision rate, and the term clustering will be more efficiently when the amount of documents grows gradually.

机译：随着通过网络的全球联网的快速发展，越来越多的信息可在线访问。它使文档聚类技术更加可有可无。通过聚类过程，我们可以有效地浏览大量信息。在本文中，我们重点研究中文文档聚类过程，该过程使用数据挖掘技术和神经网络模型。有两个主要阶段：预处理阶段和聚类阶段。在预处理阶段，我们提出了另一种中文句子分割方法，该方法基于使用基于哈希的方法的数据挖掘技术。在聚类阶段，我们采用动态SOM模型以动态地聚类数据。此外，我们使用术语向量聚类过程代替文档向量聚类过程。我们的实验表明，术语聚类可以提高准确率，并且当文档数量逐渐增长时，术语聚类将更加有效。

著录项

来源
《Asia Information Retrieval Symposium(AIRS 2005); 20051013-15; Jeju Island(KR)》|2005年|P.534-539|共6页
会议地点 Jeju Island(KR)
作者
Chih-Ming Tseng; Kun-Hsiu Tsai; Chiun-Chieh Hsu; His-Cheng Chang;
展开▼
作者单位

Department of Information Management, National Taiwan University of Science and Technology Department of Information Management, Jin-Wen Institute of Technology, Taipei, Taiwan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类数据备份与恢复;
关键词
入库时间 2022-08-26 13:56:29

相似文献

外文文献
中文文献
专利

1. Efficient Clustering Of Text Documents Using Term Based Clustering [J] . N. Deepika, K. Poornimathi, J. Anitha, International Organization of Scientific Research . 2016,第12期

机译：使用基于术语的聚类对文本文档进行有效的聚类
2. Document Clustering Based on Semi-Supervised Term Clustering [J] . Hamid Mahmoodi, Eghbal Mansoori International Journal of Artificial Intelligence & Applications (IJAIA) . 2012,第3期

机译：基于半监督术语聚类的文档聚类
3. DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering [J] . Lakshmi R., Baskar S. Journal of Information Science . 2019,第6期

机译：DIC-DOC-K-means：使用K-means的DOCument聚类基于不相似性的初始质心选择，以提高文本文档聚类的效率
4. On the Chinese Document Clustering Based on Dynamical Term Clustering [C] . Chih-Ming Tseng, Kun-Hsiu Tsai, Chiun-Chieh Hsu, Asia Information Retrieval Symposium . 2005

机译：基于动态术语聚类的中文文档聚类
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. Cluster Chemistry And Dynamics Special Feature: Cluster dynamics transcending chemical dynamics toward nuclear fusion [O] . Andreas Heidenreich, Joshua Jortner, Isidore Last 2006

机译：团簇化学与动力学特色：团簇动力学超越化学动力学走向核聚变
7. Cluster Based Term Weighting Model for Web Document Clustering [O] . Prakash B.R., Hanumanthappa M., Mamatha M. 2014

机译：基于聚类的Web文档聚类术语加权模型

On the Chinese Document Clustering Based on Dynamical Term Clustering

摘要

著录项

相似文献

相关主题

期刊订阅