首页> 外文会议>International conference on information knowledge engineering;IKE'09 >Distance Dimension Reduction on Singular Value Decomposition for Efficient Clustering Semantic XML Document Using the SVD Fuzzy C-Mean (SVD-FCM)
【24h】

Distance Dimension Reduction on Singular Value Decomposition for Efficient Clustering Semantic XML Document Using the SVD Fuzzy C-Mean (SVD-FCM)

机译:使用SVD模糊C均值(SVD-FCM)的有效聚类语义XML文档的奇异值分解的距离维减少

获取原文

摘要

The rapid growth of XML adoption has urged for the need of a proper representation for semi-structured documents, where the document semantic structural information has to be taken into account so as to support more precise document analysis. In order to analyze the information represented in XML documents efficiently, researches on XML document clustering are actively in progress. The key issue is how to devise the similarity measure between XML documents to be used for clustering. Since XML documents have hierarchical structure, it is not appropriate to cluster them by using a general document similarity measure. Dimension reduction plays an important role in handling the massive quantity of high dimensional data such as mass semantic structural documents. In this paper, we introduce distance dimension reduction (DDR) based on the singular value decomposition (DDR/SVD). DDR generates lower dimensional representations of the high-dimensional XML document, which can exactly preserve Euclidean distances and cosine similarities between any pair of XML documents in the original dimensional space. After projecting XML documents to the lower dimensional space obtained from DDR, our proposed method fuzzy c-mean to execute the document-analysis clustering algorithms (we called the SVD-FCM). DDR can substantially reduce the computing time and/or memory requirement of a given document-analysis clustering algorithm, especially when we need to run the document-analysis algorithm many times for estimating parameters or searching for a better solution.
机译:XML的迅速增长促使人们需要对半结构化文档进行适当的表示,在这种情况下,必须考虑文档语义结构信息,以支持更精确的文档分析。为了有效地分析XML文档中表示的信息,关于XML文档聚类的研究正在积极进行中。关键问题是如何设计要用于集群的XML文档之间的相似性度量。由于XML文档具有层次结构,因此不适合使用常规文档相似性度量对其进行聚类。降维在处理大量高维数据(例如,大量语义结构文档)中起着重要作用。在本文中,我们介绍了基于奇异值分解(DDR / SVD)的距离尺寸缩减(DDR)。 DDR生成高维XML文档的低维表示形式,它可以精确地保留原始维空间中任何一对XML文档之间的欧几里得距离和余弦相似度。将XML文档投影到从DDR获得的低维空间后,我们提出的方法Fuzzy c-mean来执行文档分析聚类算法(我们称为SVD-FCM)。 DDR可以大大减少给定文档分析聚类算法的计算时间和/或内存需求,尤其是当我们需要多次运行文档分析算法以估计参数或寻找更好的解决方案时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号