Chinese Web Short Text Subject Clustering Based on Similarity Upper Approximation

机译：基于相似度较高近似的中文短文本主题聚类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a Web short text clustering method based on altered Similarity Upper Approximation algorithm. After the initial text modeling, we reduce the dimension of the text feature word matrix by singular value decomposition. After the clustering is completed, we extract the most frequent words in each text cluster to represent the subject of each cluster. The clustering process does not need to specify the number of clusters in advance, and it is suitable for Web short text clustering that is constantly updated and can not know the specific number of clusters in advance. In order to make the cluster number more accurate, we proposed to add the merger of clusters based on the average similarity of clusters and outlier detection in the original algorithm. Experiments show that the altered algorithm proposed in this paper is superior to the K-means algorithm and the hierarchical clustering algorithm in clustering accuracy and more accurate to original algorithm in cluster number.

机译：本文提出了一种基于改进的相似度较高近似算法的Web短文本聚类方法。在初始文本建模之后，我们通过奇异值分解来减少文本特征词矩阵的维数。聚类完成后，我们提取每个文本聚类中最频繁出现的单词来代表每个聚类的主题。群集过程不需要预先指定群集的数量，它适用于不断更新且无法预先知道群集特定数量的Web短文本群集。为了使聚类数更准确，我们建议在原始算法中基于聚类的平均相似度和离群值检测添加聚类合并。实验表明，本文提出的改进算法在聚类精度上优于K-means算法和分层聚类算法，在聚类数上优于原始算法。

著录项

来源
《2017 International Conference on Computer Systems, Electronics and Control》|2017年|1307-1310|共4页
会议地点 Dalian(CN)
作者
JiaWei Zhu; YunHua Zhang;
展开▼
作者单位

Information Institute of Zhejiang Sci-Tech University, Hangzhou, China;

Information Institute of Zhejiang Sci-Tech University, Hangzhou, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering algorithms; Approximation algorithms; Matrix decomposition; Heuristic algorithms; Classification algorithms; Partitioning algorithms; Computational modeling;

机译：聚类算法;近似算法;矩阵分解;启发式算法;分类算法;分区算法;计算建模;;

相似文献

外文文献
中文文献
专利

1. BTM and GloVe Similarity Linear Fusion-Based Short Text Clustering Algorithm for Microblog Hot Topic Discovery [J] . Wu Di, Zhang Mengtian, Shen Chao, Quality Control, Transactions . 2020,第期

机译：基于BTM和手套相似性线性融合的微博热门主题发现的简短文本聚类算法
2. An Improved Similarity Matching based Clustering Framework for Short and Sentence Level Text [J] . M. John Basha, K.P. Kaliyamurthie International Journal of Electrical and Computer Engineering . 2017,第1期

机译：一种改进的基于相似度匹配的短句子级文本聚类框架
3. Similarity upper approximation-based clustering for recommendation system [J] . Rajhans Mishra, Pradeep Kumar International journal of business information systems . 2017,第1期

机译：推荐系统基于相似度上近似的聚类
4. Chinese Web Short Text Subject Clustering Based on Similarity Upper Approximation [C] . JiaWei Zhu, YunHua Zhang International Conference on Computer Systems, Electronics and Control . 2017

机译：基于相似性上逼近的中国网络短文本主题聚类
5. Ontology-based similarity for clustering in text space. [D] . Assem, Nasser. 2002

机译：基于本体的文本空间聚类相似度。
6. Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches [O] . Kevin W. Boyack, David Newman, Russell J. Duhon, 2011

机译：聚集超过两百万种生物医学出版物：比较九种基于文本的相似性方法的准确性
7. Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space [O] . Liqiang Pan, Pu Zhang, Anping Xiong 2015

机译：基于语义特征空间的中文短文相似度计算方法

Chinese Web Short Text Subject Clustering Based on Similarity Upper Approximation

摘要

著录项

相似文献

相关主题

期刊订阅