首页> 外文会议>International Joint Conference on Neural Networks >Classifying web documents using term spectral transforms and Multi-Dimensional Latent Semantic representation
【24h】

Classifying web documents using term spectral transforms and Multi-Dimensional Latent Semantic representation

机译:使用术语频谱变换和多维潜在语义表示对Web文档进行分类

获取原文

摘要

This research investigates the potential of document semantic representation considering both term frequencies and term associations. In particular, we proposed a general framework of the use of term spectra to represent term spatial distributions and associations through a document. The term spectra we explored involved the use of three typical techniques: Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), and Discrete Wavelet Transform (DWT). A term affinity graph was established to represent each document. We then employed a new document analysis method (recently developed by authors), named Multi-Dimensional Latent Semantic Analysis (MDLSA), which enables us to formulate an efficient semantic representation of a document based on the term affinity graph. Our algorithm was examined in the application of Web document classification. Experimental results demonstrate that the proposed technique not only gains much computational efficiency compared to Direct Graph Matching (DGM), but also outperforms the state-of-art algorithms such as VSM, PCA, RAP, and MLM.
机译:本研究调查了考虑术语频率和术语协会的文档语义表示的潜力。特别是,我们提出了使用术语光谱来表示通过文档的术语空间分布和关联的一般框架。我们探索的术语谱涉及使用三种典型技术:离散余弦变换(DCT),离散傅里叶变换(DFT)和离散小波变换(DWT)。建立一个术语亲和性图表以表示每个文件。然后,我们使用了一个新的文档分析方法(最近由作者开发),命名为多维潜在语义分析(MDLSA),这使我们能够根据亲和图来制定文档的有效语义表示。我们的算法在应用Web文档分类中进行了检查。实验结果表明,与直接图形匹配(DGM)相比,所提出的技术不仅可以获得大量的计算效率,而且优于最先进的算法,例如VSM,PCA,RAP和MLM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号