Dynamic Document Clustering using singular value decomposition.

机译：使用奇异值分解的动态文档聚类。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Document Clustering is a widely researched area in data mining. It is a technique of grouping similar documents based on a measure of similarity. Document Clustering forms an important aspect in Information Retrieval for improving precision and recall in search applications, navigation and presentation of search results. But due to the tremendous amount of features, textual data suffers from the "Curse of Dimensionality". Moreover, adding new features increases the noise in the data. To address these issues, in this thesis we investigate the use of Singular Value Decomposition (SVD) and propose a sophisticated Document Clustering algorithm combining folding-in method and k-means algorithm, to efficiently store and dynamically incorporate new textual data into the existing cluster formations. We test our approach by introducing new documents in increments of 1%, 5%, 10%, 15%, and 20%. These new documents are added in two variations. One document set comprises of completely new documents and the other is formed by modifying the existing documents. Our method promises significant improvements in computation costs, storage costs and cluster quality compared to recomputing-SVD method.;We also present a novel approach for retrieving documents of interest to the users. The user can choose documents using different window sizes either time windows or subset of documents. Our experimental evaluations show that the proposed method of document retrieval outperforms recomputing-SVD method significantly in computation time with promise of flexibility and good cluster quality.

机译：文档聚类是数据挖掘中一个广泛研究的领域。这是一种基于相似性度量将相似文档分组的技术。文档聚类是信息检索中的一个重要方面，它可以提高搜索应用程序的精度和召回率，导航和显示搜索结果。但是由于功能众多，文本数据遭受“维数诅咒”的困扰。此外，添加新功能会增加数据中的噪音。为了解决这些问题，本文研究了奇异值分解（SVD）的使用，并提出了一种将折入法和k-means算法相结合的复杂文档聚类算法，以有效地将新文本数据存储和动态合并到现有聚类中编队。我们通过以1％，5％，10％，15％和20％的增量引入新文档来测试我们的方法。这些新文档以两种形式添加。一个文档集包含全新的文档，而另一个则通过修改现有文档形成。与重新计算-SVD方法相比，我们的方法有望显着改善计算成本，存储成本和集群质量。;我们还提出了一种新颖的方法，用于检索用户感兴趣的文档。用户可以使用不同的窗口大小（时间窗口或文档子集）选择文档。我们的实验评估表明，所提出的文档检索方法在计算时间方面明显优于recomputing-SVD方法，并且具有灵活性和良好的簇质量的希望。

著录项

作者
Ramesh, Rashmi Nadubeedi.;
展开▼
作者单位

University of Maryland, Baltimore County.;

展开▼
授予单位 University of Maryland, Baltimore County.;
学科 Information Technology.;Information Science.
学位 M.S.
年度 2011
页码 86 p.
总页数 86
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Biclustering via sparse singular value decomposition. [J] . Lee M, Shen H, Huang JZ, Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2010 ,第4期

机译：通过稀疏奇异值分解进行聚类。
2. Dynamic Document Clustering Using Singular Value Decomposition [J] . Rashmi Nadubeediramesh, Aryya Gangopadhyay International journal of computational models and algorithms in medicine. . 2012 ,第3期

机译：使用奇异值分解的动态文档聚类
3. Delay and dispersion effects in dynamic susceptibility contrast MRI: simulations using singular value decomposition. [J] . Calamante F, Gadian DG, Connelly A Magnetic resonance in medicine: official journal of the Society of Magnetic Resonance in Medicine . 2000 ,第3期

机译：动态磁化率对比MRI中的延迟和色散效应：使用奇异值分解的模拟。
4. Estimating the Intensity and Anisotropy of Tumor Treating Fields Jsing Singular Value Decomposition. Towards a More Comprehensive Estimation of Anti-tumor Efficacy [C] . Anders R. Korshoej, Axel Thielscher Annual International Conference of the IEEE Engineering in Medicine and Biology Society . 2018

机译：用奇异值分解估计肿瘤治疗领域的强度和各向异性。寻求更全面的抗肿瘤功效评估
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. Cluster Chemistry And Dynamics Special Feature: Cluster dynamics transcending chemical dynamics toward nuclear fusion [O] . Andreas Heidenreich, Joshua Jortner, Isidore Last 2006

机译：团簇化学与动力学特色：团簇动力学超越化学动力学走向核聚变
7. WEB NEWS DOCUMENTS CLUSTERING IN INDONESIAN LANGUAGE USING SINGULAR VALUE DECOMPOSITION-PRINCIPAL COMPONENT ANALYSIS (SVDPCA) AND ANT ALGORITHMS [O] . Arif Fadllullah, Dasrit Debora Kamudi, Muhamad Nasir, 2016

机译：使用奇异值分解 - 主成分分析（SVDPCA）和ANT算法中的印度尼西亚语文档集群聚类
8. Network Monitoring Traffic Compression Using Singular Value Decomposition. [R] . Feigh, S. N. 2014

机译：基于奇异值分解的网络监控流量压缩。

Dynamic Document Clustering using singular value decomposition.

摘要

著录项

相似文献

相关主题

期刊订阅