Improving Precision of Inter-Document Similarity Measure by Clustering SVD

机译：通过聚类SVD提高文档间相似度量的精度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text representation, which is a fundamental and necessary step for intelligent text processing, refers to the process of determining index terms for documents and transferring the documents into numeric vectors using index terms. LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition)is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, clustering SVD, by which SVD is conducted on text clusters not on the whole term-document matrix, is proposed to improve discriminative power of latent semantic indexing based on SVD. The key idea of clustering SVD is to cluster texts in text collection firstly and then SVD is carried out on these text clusters. We conjecture that clustering computation involved in SVD will improve statistical qualities of indexing terms produced by latent semantic indexing. A Chinese corpus and English corpus are used respectively to examine the clustering SVD method. The experiments showed that the proposed method can actually improve precision of inter-document similarity measure in comparison with classic LSI based on SVD. Moreover, more and more significance of its superior performance over LSI based on SVD turns up when less and less preservation rates for matrix approximation are set as required parameters.

机译：文本表示，这是智能文本处理的基本和必要步骤，是指使用索引术语确定文档的索引术语并将文档传送到数字向量的过程。提出了基于SVD（奇异值分解）的LSI（潜在语义索引），以克服传统词汇匹配中的多义和同声代的问题。然而，由于与良好的代表性质量验证，它通常被批评为代表文件的低鉴别权。在本文中，提出了在不在整个术语文件矩阵上的文本集群上进行SVD的聚类SVD，以提高基于SVD的潜在语义索引的判别力量。群集SVD的关键概念是首先在文本集合中群集文本，然后在这些文本群集中执行SVD。我们猜想SVD中涉及的聚类计算将改善潜在语义索引产生的索引项的统计质量。中文语料库和英语语料库分别用于检查群集SVD方法。实验表明，与基于SVD的经典LSI相比，该方法实际上可以提高文档间相似度量的精度。此外，基于SVD的LSI的卓越性能越来越重要，当矩阵近似的较少和较少的保存速率被设置为所需的参数时，其越来越少。

著录项

来源
《International symposium on knowledge and systems sciences》|2008年||共7页
会议地点
作者
Wen Zhang; Taketoshi Yoshida; Xijin Tang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与知识传播;
关键词
text representation; LSI; SVD; clustering SVD; similarity measure;

机译：文本表示;LSI;SVD;聚类SVD;相似度措施;

相似文献

外文文献
中文文献
专利

1. Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure [J] . Zhang Wen, Xiao Fan, Li Bin, Computational intelligence and neuroscience . 2016,第Pta3期

机译：在集群中使用SVD来提高interdocument相似度的精度
2. GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness [J] . Meng Liu, Paul D. Thomas BMC Bioinformatics . 2019,第1期

机译：Go功能相似性群集取决于相似度测量，聚类方法和注释完整性
3. Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure [J] . Fan Yang, Qing-Xin Zhu, Dong-Ming Tang, Evolutionary Bioinformatics . 2010,第4期

机译：基于改进的相似性度量，使用亲和力传播对蛋白质序列进行聚类
4. Improving Precision of Inter-Document Similarity Measure by Clustering SVD [C] . Wen Zhang, Taketoshi Yoshida, Xijin Tang The 9th international symposium on knowledge and systems sciences jointly with 4th Asia-Pacific international conference on knowledge management . 2008

机译：通过聚类SVD提高文档间相似性度量的精度
5. A comparison of clustering procedures and similarity measures in creating clusters using warp functions. [D] . Elguindi, Anne Charlotte. 2010

机译：使用warp函数创建聚类时聚类过程和相似性度量的比较。
6. Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure [O] . Wen Zhang, Fan Xiao, Bin Li, 2016

机译：在群集上使用SVD提高文档间相似性度量的精度
7. Improving the Performance of SVM Text Categorization with Inter-document Similarities [O] . 2005

机译：用文档际相似性提高SVM文本分类的性能
8. A NEW MEASURE OF BIOTIC SIMILARITY BETWEEN SAMPLES AND ITS APPLICATIONS WITH A CLUSTER ANALYSIS PROGRAM [R] . Carlos F. A. Pinkham 1974

机译：利用聚类分析程序测量样品间的生物相似性及其应用的新方法

Improving Precision of Inter-Document Similarity Measure by Clustering SVD

摘要

著录项

相似文献

相关主题

期刊订阅