首页> 外文期刊>International journal of computational vision and robotics >A Wikipedia-based semantic tensor space model for text analytics
【24h】

A Wikipedia-based semantic tensor space model for text analytics

机译:基于维基百科的文本分析语义张量空间模型

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a third-order tensor space model that represents textual documents, which contains the 'concept' space independently of the 'document' and 'term' spaces. In the vector space model (VSM), a document is represented as a vector in which each dimension corresponds to a term. In contrast, the model described here represents a document as a matrix. Most current text mining algorithms only take vectors as their input, but they suffer from 'term independence' and 'loss of term senses' issues. To overcome these problems, we incorporate the 'concept' as a distinct space in the VSM. For this, it is necessary to produce the concept vector for each term that occurs in a given document, which is related to word sense disambiguation. As an external knowledge source for concept weighting, we employ the Wikipedia Encyclopedia, which has been evaluated as world knowledge and used to improve many text-mining algorithms. Through experiments using two popular document corpora, we demonstrate the superiority of the model in terms of text clustering and text classification.
机译:本文提出了三阶张量空间模型,表示文本文档,它包含“概念”空间,独立于“文档”和“术语”空间。在矢量空间模型(VSM)中,文件表示为向量,其中每个维度对应于术语。相反,这里描述的模型表示作为矩阵的文档。大多数当前的文本挖掘算法仅将载体作为其输入,但它们遭受“术语独立”和“术语感官的问题”。为了克服这些问题,我们将“概念”纳入VSM中的不同空间。为此,有必要为在给定文档中发生的每个术语生成概念向量,这与词感测歧义有关。作为概念加权的外部知识来源,我们雇用了维基百科百科全书,该百科全书被评为世界知识,并用于改善许多文本挖掘算法。通过使用两个流行的Document Corpora的实验,我们在文本聚类和文本分类方面展示了模型的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号