首页> 外文期刊>Soft Computing >Document clustering using locality preserving indexing and support vector machines
【24h】

Document clustering using locality preserving indexing and support vector machines

机译:使用局部性保留索引和支持向量机的文档聚类

获取原文
获取原文并翻译 | 示例

摘要

A method of document clustering based on locality preserving indexing (LPI) and support vector machines (SVM) is presented. The document space is generally of high dimensionality, and clustering in such a high-dimensional space is often infeasible due to the curse of dimensionality. In this paper, by using LPI, the documents are projected into a lower-dimension semantic space in which the documents related to the same semantic are close to each other. Then, by using SVM, the vectors in semantic space are mapped by means of a Gaussian kernel to a high-dimensional feature space in which the minimal enclosing sphere is searched. The sphere, when mapped back to semantics space, can separate into several independent components by the support vectors, each enclosing a separate cluster of documents. By combining the LPI and SVM, not only higher clustering accuracies in a more unsupervised effective way, but also better generalization properties can be obtained. Extensive demonstrations are performed on the Reuters-21578 and TDT2 data sets.
机译:提出了一种基于局部保存索引(LPI)和支持向量机(SVM)的文档聚类方法。文档空间通常是高维的,并且由于维数的诅咒,在这样的高维空间中聚集通常是不可行的。在本文中,通过使用LPI,文档被投影到一个较低维的语义空间中,其中与相同语义相关的文档彼此接近。然后,通过使用SVM,借助高斯核将语义空间中的向量映射到高维特征空间,在高维特征空间中搜索最小的封闭球体。当球体映射回语义空间时,可以通过支持向量分成几个独立的组件,每个组件都包含一个单独的文档簇。通过将LPI和SVM结合在一起,不仅可以以更不受监督的有效方式获得更高的聚类精度,而且可以获得更好的泛化特性。在Reuters-21578和TDT2数据集上进行了广泛的演示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号