首页> 外文会议>International Workshop on Knowledge Discovery from XML Documents >XML Document Clustering by Independent Component Analysis
【24h】

XML Document Clustering by Independent Component Analysis

机译:独立组件分析XML文档群集

获取原文

摘要

When XML documents are clustered, the high dimensionality problem will occur. Independent Component Analysis (ICA) can reduce dimensionality and in the meanwhile find the underlying latent variables of XML structures to improve the quality of the clustering. This paper proposes a novel strategy to cluster XML documents based on ICA. According to D_path extracted from XML trees, the document was at first represented as Vector Space Model (VSM).Then ICA is applied to reduce the dimensionality of document vectors. Furthermore, document vectors are clustered on this reduced Euclidean Space spanned by the independent components. The experiments show that ICA can enhance the accuracy of the clustering with stable performance.
机译:群集群体文档群集时,将发生高维数问题。独立分量分析(ICA)可以减少维度,同时发现XML结构的潜在潜变量,以提高群集的质量。本文提出了一种基于ICA群集XML文档的新策略。根据从XML树提取的D_Path,文件首先表示为矢量空间模型(VSM)。然后应用了ICA以减少文档向量的维度。此外,文档向量在由独立组件跨越的这种降低的欧几里德空间上聚集。实验表明,ICA可以提高群体性能的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号