Distance Dimension Reduction on Singular Value Decomposition for Efficient Clustering Semantic XML Document Using the SVD Fuzzy C-Mean (SVD-FCM)

机译：使用SVD模糊C均值（SVD-FCM）的有效聚类语义XML文档的奇异值分解的距离维减少

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rapid growth of XML adoption has urged for the need of a proper representation for semi-structured documents, where the document semantic structural information has to be taken into account so as to support more precise document analysis. In order to analyze the information represented in XML documents efficiently, researches on XML document clustering are actively in progress. The key issue is how to devise the similarity measure between XML documents to be used for clustering. Since XML documents have hierarchical structure, it is not appropriate to cluster them by using a general document similarity measure. Dimension reduction plays an important role in handling the massive quantity of high dimensional data such as mass semantic structural documents. In this paper, we introduce distance dimension reduction (DDR) based on the singular value decomposition (DDR/SVD). DDR generates lower dimensional representations of the high-dimensional XML document, which can exactly preserve Euclidean distances and cosine similarities between any pair of XML documents in the original dimensional space. After projecting XML documents to the lower dimensional space obtained from DDR, our proposed method fuzzy c-mean to execute the document-analysis clustering algorithms (we called the SVD-FCM). DDR can substantially reduce the computing time and/or memory requirement of a given document-analysis clustering algorithm, especially when we need to run the document-analysis algorithm many times for estimating parameters or searching for a better solution.

机译：XML的迅速增长促使人们需要对半结构化文档进行适当的表示，在这种情况下，必须考虑文档语义结构信息，以支持更精确的文档分析。为了有效地分析XML文档中表示的信息，关于XML文档聚类的研究正在积极进行中。关键问题是如何设计要用于集群的XML文档之间的相似性度量。由于XML文档具有层次结构，因此不适合使用常规文档相似性度量对其进行聚类。降维在处理大量高维数据（例如，大量语义结构文档）中起着重要作用。在本文中，我们介绍了基于奇异值分解（DDR / SVD）的距离尺寸缩减（DDR）。 DDR生成高维XML文档的低维表示形式，它可以精确地保留原始维空间中任何一对XML文档之间的欧几里得距离和余弦相似度。将XML文档投影到从DDR获得的低维空间后，我们提出的方法Fuzzy c-mean来执行文档分析聚类算法（我们称为SVD-FCM）。 DDR可以大大减少给定文档分析聚类算法的计算时间和/或内存需求，尤其是当我们需要多次运行文档分析算法以估计参数或寻找更好的解决方案时。

著录项

来源
《International conference on information knowledge engineering;IKE'09》|2009年|134-140|共7页
会议地点
作者
Hsu-Kuang Chang; I-Chang Jou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
singular value decomposition; distance dimension reduction; PEWF; PEIDF; PESSW; fuzzy C-mean; SVD-FCM;

机译：奇异值分解;距离尺寸减小; PEWF; PEIDF; PESSW;模糊C均值SVD-FCM;

相似文献

外文文献
中文文献
专利

1. Semantic Similarity-Based Clustering of Web Documents Using Fuzzy C-Means [J] . J. Avanija, K. Ramar International Journal of Computational Intelligence and Applications . 2015,第3期

机译：基于语义相似度的Web文档模糊C均值聚类
2. CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spaces [J] . Castelli V., Thomasian A., Chung-Sheng Li IEEE Transactions on Knowledge and Data Engineering . 2003,第3期

机译：CSVD：聚类和奇异值分解，用于在高维空间中进行近似相似性搜索
3. Automatic detection of HFOs based on singular value decomposition and improved fuzzy c-means clustering for localization of seizure onset zones [J] . Wan Xiongbo, Fang Zelin, Wu Min, Neurocomputing . 2020,第Auga4期

机译：基于奇异值分解的HFO的自动检测及改进的模糊C-MERIAL聚类癫痫发作区域的定位
4. Distance Dimension Reduction on Singular Value Decomposition for Efficient Clustering Semantic XML Document Using the SVD Fuzzy C-Mean (SVD-FCM) [C] . International conference on information knowledge engineering . 2009

机译：使用SVD模糊C-均值（SVD-FCM）有效聚类语义XML文档的距离尺寸降低
5. Dynamic Document Clustering using singular value decomposition. [D] . Ramesh, Rashmi Nadubeedi. 2011

机译：使用奇异值分解的动态文档聚类。
6. Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering [O] . Andriy Oliynyk, Claudio Bonifazzi, Fernando Montani, 2012

机译：具有奇异值分解和模糊C均值聚类的自动在线尖峰排序
7. Data Dimension Reduction for Clustering Semi-Structured Documents using QR Fuzzy C-Mean (QR-FCM) [O] . 2019

机译：使用QR模糊C-MEAL（QR-FCM）聚类半结构化文档的数据尺寸减少

Distance Dimension Reduction on Singular Value Decomposition for Efficient Clustering Semantic XML Document Using the SVD Fuzzy C-Mean (SVD-FCM)

摘要

著录项

相似文献

相关主题

期刊订阅