首页> 外文会议>ACM conference on information and knowledge management >Learning Ontology Resolution for Document Representation and its Applications in Text Mining
【24h】

Learning Ontology Resolution for Document Representation and its Applications in Text Mining

机译:学习文档代表的本体决议及其在文本挖掘中的应用

获取原文

摘要

It is well known that synonymous and polysemous terms often bring in some noises when calculating the similarity between documents. Existing ontology-based document representation methods are static, hence, the chosen semantic concept set for representing a document has a fixed resolution and it is not adaptable to the characteristics of a document collection and the text mining problem in hand. We propose an Adaptive Concept Resolution (ACR) model to overcome this issue. ACR can learn a concept border from an ontology taking into consideration of the characteristics of a particular document collection. Then this border can provide a tailor-made semantic concept representation for a document coming from the same domain. Another advantage of ACR is that it is applicable in both classification task where the groups are given in the training document set, and clustering task where no group information is available. Furthermore, the result of this model is not sensitive to the model parameter. The experimental results show that ACR outperforms an existing static method significantly.
机译:众所周知,同义和多殖民术语通常会在计算文件之间的相似性时带来一些噪音。现有的基于本体的文档表示的方法是静态的,因此,用于表示一个文件选择的语义概念集具有一个固定的分辨率,它是不适合于一个文档集合在手,文本挖掘问题的特性。我们提出了一个自适应概念分辨率(ACR)模型来克服这个问题。考虑到特定文件集合的特征,ACR可以从本体学中学到一个概念边界。然后,此边框可以为来自同一域的文档提供量身定制的语义概念表示。 ACR的另一个优点是它适用于两个分类任务,其中组在训练文件集中给出,并且没有可用组信息的聚类任务。此外,该模型的结果对模型参数不敏感。实验结果表明,ACR显着优于现有的静态方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号