【24h】

Dimension Reduction by Word Clustering with Semantic Distance

机译:用语义距离的单词聚类减少尺寸

获取原文

摘要

In information retrieval, Latent Semantic Analysis (LSA) is a method to handle large and sparse document vectors. LSA reduces the dimension of document vectors by producing a set of topics related to the documents and terms statistically. Therefore, it needs a certain number of words and takes no account of semantic relations of words. In this paper, by clustering the words using semantic distances of words, the dimension of document vectors is reduced to the number of word-clusters. Word distance is able to be calculated by using WordNet. This method is free from the amount of words and documents. For especially small documents, we use word's definition in a dictionary and calculate the similarities between documents.
机译:在信息检索中,潜在语义分析(LSA)是处理大型和稀疏文档向量的方法。 LSA通过统计上的文档和术语产生一系列主题来减少文档向量的维度。因此,它需要一定数量的单词,并且没有考虑单词的语义关系。在本文中,通过使用单词的语义距离聚类单词,文档向量的维度降低到单词簇的数量。单词距离能够通过使用WordNet来计算。此方法没有单词和文档的数量。对于尤其是小文件,我们在字典中使用Word的定义并计算文档之间的相似之处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号