首页> 外文会议>European Conference on Principles and Practice of Knowledge Discovery in Databases >Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification
【24h】

Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification

机译:在文本分类中利用分层叙述的词感歧义

获取原文

摘要

The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional "bag of words" representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small.
机译:包含重要语义信息的分层叙述(HT)引入,LED研究人员调查其提高文本分类任务性能的潜力,扩展了传统的“单词”表示,包括语法和语义关系。在本文中,我们通过提出基于文档中的Word邻接的直觉来解决这个问题的解决问题,该问题也意味着在HT图中也意味着接近。我们认为,我们的WSD算法在各种人类消除的基准数据集中展出的高精度适用于分类任务。此外,我们根据GVSM内核的一般概念来定义一个语义内核,它捕获了分层中包含的语义关系。最后,我们使用SVM算法使用各种数集进行系统改善的实验,尤其是当训练集很小时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号