首页> 外文会议>CIKM 10;ACM conference on information and knowledge management >Learning Ontology Resolution for Document Representation and its Applications in Text Mining
【24h】

Learning Ontology Resolution for Document Representation and its Applications in Text Mining

机译:用于文档表示的学习本体解析及其在文本挖掘中的应用

获取原文

摘要

It is well known that synonymous and polysemous terms often bring in some noises when calculating the similarity between documents. Existing ontology-based document representation methods are static, hence, the chosen semantic concept set for representing a document has a fixed resolution and it is not adaptable to the characteristics of a document collection and the text mining problem in hand. We propose an Adaptive Concept Resolution (ACR) model to overcome this issue. ACR can learn a concept border from an ontology taking into consideration of the characteristics of a particular document collection. Then this border can provide a tailor-made semantic concept representation for a document coming from the same domain. Another advantage of ACR is that it is applicable in both classification task where the groups are given in the training document set, and clustering task where no group information is available. Furthermore, the result of this model is not sensitive to the model parameter. The experimental results show that ACR outperforms an existing static method significantly.
机译:众所周知,当计算文档之间的相似度时,同义词和多义词经常会带来一些干扰。现有的基于本体的文档表示方法是静态的,因此,所选择的用于表示文档的语义概念集具有固定的分辨率,并且不适合于文档集合的特征和现有的文本挖掘问题。我们提出了一种自适应概念解决方案(ACR)模型来克服此问题。考虑到特定文档集合的特征,ACR可以从本体学习概念边界。然后,该边界可以为来自相同域的文档提供量身定制的语义概念表示。 ACR的另一个优点是,它既适用于在培训文档集中指定了组的分类任务,又适用于没有可用组信息的聚类任务。此外,该模型的结果对模型参数不敏感。实验结果表明,ACR明显优于现有的静态方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号